Re: quieting HBase metrics

2013-09-24 Thread Bing Jiang
hi, Ron. Indeed, I have rewrote the implementation of org.apache.hadoop.hbase.regionserver.metrics.RegionServerMetrics org.apache.hadoop.hbase.regionserver.metrics.RegionServerDynamicMetrics There are major changes as below: 1) change the context name for hbase to separate name for each other;

Hbase Compression

2013-09-24 Thread aiyoh79
Hi, I am using hbase 0.94.11 and i feel a bit confuse when looking at the log file below: 13/09/24 13:11:00 INFO regionserver.Store: Flushed , sequenceid=687077, memsize= 122.1m, into tmp file hdfs://192.168.123.123:54310/hbase/usertable/b19289cf9b1400

How To Create Partitions In Hbase Table As Like Hive Table Partitions

2013-09-24 Thread hadoopyod
We are planning to migrate from CDH3 to CDH4, as part of this migration we also planning to bring HBASE into out system because it also updates to the data, in CDH3 we are using Hive as warehouse. Here we are having the major problem in migration, Hive supports partitions to tables. And our

Spatial data posting in HBase

2013-09-24 Thread cto
Hi , I am very new in HBase. Could you please let me know , how to insert spatial data (Latitude / Longitude) in HBase using Java . -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Spatial-data-posting-in-HBase-tp4051123.html Sent from the HBase User mailing list

Re: Spatial data posting in HBase

2013-09-24 Thread Ted Yu
There're plenty of examples in unit tests. e.g. : Put put = new Put(Bytes.toBytes(row + String.format(%1$04d, i))); put.add(family, null, value); table.put(put); value can be obtained through Bytes.toBytes(). table is an HTable. Cheers On Tue, Sep 24, 2013 at 4:15 AM, cto

Re: Write TimeSeries Data and Do Time Based Range Scans

2013-09-24 Thread anil gupta
Inline On Mon, Sep 23, 2013 at 6:15 PM, Shahab Yunus shahab.yu...@gmail.comwrote: Yeah, I saw that. In fact that is why I recommended that to you as I couldn't infer from your email that whether you have already gone through that source or not. Yes, i was aware of that article. But my read

Re: Write TimeSeries Data and Do Time Based Range Scans

2013-09-24 Thread Shahab Yunus
I'm only know of the links already embedded in the blog page that I sent you or you have this. https://groups.google.com/forum/#!forum/hbasewd Regards, Shahab On Tue, Sep 24, 2013 at 11:12 AM, anil gupta anilgupt...@gmail.com wrote: Inline On Mon, Sep 23, 2013 at 6:15 PM, Shahab Yunus

Re: Why is this region compacting?

2013-09-24 Thread Jean-Marc Spaggiari
Hi Tom, What is your table schema for this region? How many CFs? Also, what do you have on the logs for this table? Thanks, JM 2013/9/24 Tom Brown tombrow...@gmail.com I have a region that is very small, only 5MB. Despite it's size, it has 24 store files. The logs show that it's compacting

Re: Why is this region compacting?

2013-09-24 Thread Bharath Vissapragada
It would help if you can show your RS log (via pastebin?) . Are there frequent flushes for this region too? On Tue, Sep 24, 2013 at 9:20 PM, Tom Brown tombrow...@gmail.com wrote: I have a region that is very small, only 5MB. Despite it's size, it has 24 store files. The logs show that it's

Re: Why is this region compacting?

2013-09-24 Thread Tom Brown
There is one column family, d. Each row has about 10 columns, and each row's total data size is less than 2K. Here is a small snippet of logs from the region server: http://pastebin.com/S2jE4ZAx --Tom On Tue, Sep 24, 2013 at 9:59 AM, Bharath Vissapragada bhara...@cloudera.com wrote: It

Re: Why is this region compacting?

2013-09-24 Thread Jean-Marc Spaggiari
Can you past logs a bit before that? To see if anything triggered the compaction? Before the 1M compactions entries. Also, what is your setup? Are you running in Standalone? Pseudo-Dist? Fully-Dist? Thanks, JM 2013/9/24 Tom Brown tombrow...@gmail.com There is one column family, d. Each row

Re: HBase Table Row Count Optimization - A Solicitation For Help

2013-09-24 Thread James Birchfield
Just wanted to follow up here with a little update. We enabled the Aggregation coprocessor on our dev cluster. Here are the quick timing stats. Tables: 565 Total Rows: 2,749,015,957 Total Time (to count): 52m:33s Will be interesting to see how this fairs against our production clusters with

Re: Why is this region compacting?

2013-09-24 Thread Tom Brown
My cluster is fully distributed (2 regionserver nodes). Here is a snippet of log entries that may explain why it started: http://pastebin.com/wQECif8k. I had to go back 2 days to find when it started for this region. This is not the only region experiencing this issue (but this is the smallest

Re: Write TimeSeries Data and Do Time Based Range Scans

2013-09-24 Thread James Taylor
Hey Anil, The solution you've described is the best we've found for Phoenix (inspired by the work of Alex at Sematext). You can do all of this in a few lines of SQL: CREATE TABLE event_data( who VARCHAR, type SMALLINT, id BIGINT, when DATE, payload VARBINARY CONSTRAINT pk PRIMARY KEY

Re: Why is this region compacting?

2013-09-24 Thread Jean-Marc Spaggiari
Strange. Few questions then. 1) What is your hadoop version? 2) Is the clock on all your severs synched with NTP? 3) What is you table definition? Bloom filters, etc.? This is the reason why it keep compacting: 2013-09-24 10:04:00,548 INFO

Re: Why is this region compacting?

2013-09-24 Thread Jean-Marc Spaggiari
Another important information why might be the root cause of this issue... Do you have any TTL defines for this table? JM 2013/9/24 Jean-Marc Spaggiari jean-m...@spaggiari.org Strange. Few questions then. 1) What is your hadoop version? 2) Is the clock on all your severs synched with

Re: Why is this region compacting?

2013-09-24 Thread Tom Brown
1. Hadoop version is 1.1.2. 2. All servers are synched with NTP. 3. Table definition is: 'compound0', { NAME = 'd', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROW', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'SNAPPY', MIN_VERSIONS = '0', TTL = '864', KEEP_DELETED_CELLS = 'false',

Re: Hbase Compression

2013-09-24 Thread Ted Yu
memstoreSize of 128.2m was recorded at the beginning of HRegion#internalFlushcache(). After the flush, memstoreSize became 48.0m. Cheers On Tue, Sep 24, 2013 at 3:50 AM, aiyoh79 tcheng...@gmail.com wrote: Hi, I am using hbase 0.94.11 and i feel a bit confuse when looking at the log file

Re: Why is this region compacting?

2013-09-24 Thread Jean-Marc Spaggiari
TTL seems to be fine. -1 is the default value for TimeRangeTracker.maximumTimestamp. Can you run: hadoop fs -lsr hdfs:// hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/ Thanks, JM 2013/9/24 Tom Brown tombrow...@gmail.com 1. Hadoop version is 1.1.2.

Re: Why is this region compacting?

2013-09-24 Thread Tom Brown
-rw--- 1 hadoop supergroup 2194 2013-09-21 14:32 /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/014ead47a9484d67b55205be16802ff1 -rw--- 1 hadoop supergroup 31321 2013-09-24 05:49 /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1305d625bd4a4be39a98ae4d91a66140

Re: Hbase Compression

2013-09-24 Thread Jean-Daniel Cryans
On flushing we do some cleanup, like removing deleted data that was already in the MemStore or extra versions. Could it be that you are overwriting recently written data? 48MB is the size of the Memstore that accumulated while the flushing happened. J-D On Tue, Sep 24, 2013 at 3:50 AM, aiyoh79

Re: Why is this region compacting?

2013-09-24 Thread Tom Brown
Same thing in pastebin: http://pastebin.com/tApr5CDX On Tue, Sep 24, 2013 at 11:18 AM, Tom Brown tombrow...@gmail.com wrote: -rw--- 1 hadoop supergroup 2194 2013-09-21 14:32 /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/014ead47a9484d67b55205be16802ff1 -rw--- 1 hadoop

Re: Why is this region compacting?

2013-09-24 Thread Jean-Marc Spaggiari
So. Looking at the code, this, for me, sound like a bug. I will try to reproduce it locally. Seems to be related to the combination of TTL + BLOOM. Creating a table for that right now, will keep you posted very shortly. JM 2013/9/24 Tom Brown tombrow...@gmail.com -rw--- 1 hadoop

Re: Why is this region compacting?

2013-09-24 Thread Sergey Shelukhin
To mitigate, you can change hbase.store.delete.expired.storefile to false on one region server, or for entire table, and restart this RS. This will trigger a different compaction, hopefully. We'd need to find what the bug is. My gut feeling (which is known to be wrong often) is that it has to do

Re: Why is this region compacting?

2013-09-24 Thread Sergey Shelukhin
Yeah, I think c3580bdb62d64e42a9eeac50f1c582d2 store file is a good example. Can you grep for c3580bdb62d64e42a9eeac50f1c582d2 and post the log just to be sure? Thanks. It looks like an interaction of deleting expired files and // Create the writer even if no kv(Empty store file is also

Re: Why is this region compacting?

2013-09-24 Thread Jean-Marc Spaggiari
We get -1 because of this: byte [] timerangeBytes = metadataMap.get(TIMERANGE_KEY); if (timerangeBytes != null) { this.reader.timeRangeTracker = new TimeRangeTracker(); Writables.copyWritable(timerangeBytes, this.reader.timeRangeTracker); }

Re: Why is this region compacting?

2013-09-24 Thread Jean-Marc Spaggiari
One more Tom, When you will have been able capture de HFile locally, please run run the HFile class on it to see the number of keys (is it empty?) and the other specific information. bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -m -s -v -f HFILENAME Thanks, JM 2013/9/24 Jean-Marc

Re: Why is this region compacting?

2013-09-24 Thread Tom Brown
/usr/lib/hbase/bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -m -s -v -f /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fca0882dc7624342a8f4fce4b89420ff 13/09/24 12:33:40 INFO util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32 Scanning -

Re: Why is this region compacting?

2013-09-24 Thread Jean-Marc Spaggiari
Can you try with less parameters and see if you are able to get something from it? This exception is caused by the printMeta, so if you remove -m it should be ok. However, printMeta was what I was looking for ;) getFirstKey for this file seems to return null. So it might simply be an empty file,

Re: Why is this region compacting?

2013-09-24 Thread Tom Brown
Yes, it is empty. 13/09/24 13:03:03 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 2.9g 13/09/24 13:03:03 ERROR metrics.SchemaMetrics: Inconsistent configuration. Previous configuration for using table name in metrics: true, new configuration: false 13/09/24 13:03:03 WARN

Re: Why is this region compacting?

2013-09-24 Thread Jean-Marc Spaggiari
Hi Tom, Thanks for this information and the offer. I think we have enought to start to look at this issue. I'm still trying to reproduce that locally. In the meantime, I sent a patch to fix the NullPointException your faced before. I will post back here if I'm able to reproduce. Have you tried

Re: Why is this region compacting?

2013-09-24 Thread Tom Brown
I tried the workaround, and it is working very well. The number of store files for all regions is now sane (went from about 8000 total store files to 1000), and scans are now much more efficient. Thanks for all your help, Jean-Marc and Sergey! --Tom On Tue, Sep 24, 2013 at 2:11 PM, Jean-Marc

[ANNOUNCE] HBase 0.94.12 is available for download

2013-09-24 Thread lars hofhansl
The HBase Team is pleased to announce the immediate release of HBase 0.94.12. Download it from your favorite Apache mirror [1]. This release has also been pushed to Apache's maven repository. All previous 0.92.x and 0.94.x releases can upgraded to 0.94.12 via a rolling upgrade without downtime,

Re: Is there a problem with having 4000 tables in a cluster?

2013-09-24 Thread Jean-Marc Spaggiari
Hi Jeremy, I don't see any issue for HBase to handle 4000 tables. However, I don't think it's the best solution for your use case. JM 2013/9/24 jeremy p athomewithagroove...@gmail.com Short description : I'd like to have 4000 tables in my HBase cluster. Will this be a problem? In general,

Re: Is there a problem with having 4000 tables in a cluster?

2013-09-24 Thread Varun Sharma
Its better to do some salting in your keys for the reduce phase. Basically, make ur key be something like KeyHash + Key and then decode it in your reducer and write to HBase. This way you avoid the hotspotting problem on HBase due to MapReduce sorting. On Tue, Sep 24, 2013 at 2:50 PM, Jean-Marc

HMaster is running, but cant create table : client sais its initializing, and -debug sais its zookeeper related.

2013-09-24 Thread Jay Vyas
hi hbase ! Im getting mixed messages in the errors when creating a table in a simple hbase two node cluster. 1) My HMaster is clearly running: 21519 Manager 14748 HMaster 25110 Jps 9887 QuorumPeerMain 15473 HRegionServer 14062 ServiceMain 5702 Bootstrap 2) But when I try to create a table:

Re: HMaster is running, but cant create table : client sais its initializing, and -debug sais its zookeeper related.

2013-09-24 Thread Ted Yu
Are you able to view Master Web UI ? What exceptions do you see in master log ? Cheers On Tue, Sep 24, 2013 at 3:26 PM, Jay Vyas jayunit...@gmail.com wrote: hi hbase ! Im getting mixed messages in the errors when creating a table in a simple hbase two node cluster. 1) My HMaster is

Re: Is there a problem with having 4000 tables in a cluster?

2013-09-24 Thread jeremy p
Varun : I'm familiar with that method of salting. However, in this case, I need to do filtered range scans. When I do a lookup for a given WORD at a given POSITION, I'll actually be doing a regex on a range of WORDs at that POSITION. If I salt the keys with a hash, the WORDs will no longer be

Re: Why is this region compacting?

2013-09-24 Thread Jean-Marc Spaggiari
Hi Tom, Thanks for reporting this and for providing all this information. I have attached a patch on the JIRA that Sergey's opened. This will need to be reviewed and we will need a commiter to push it if it's accepted. JM 2013/9/24 Tom Brown tombrow...@gmail.com I tried the workaround, and

Re: Is there a problem with having 4000 tables in a cluster?

2013-09-24 Thread Jean-Marc Spaggiari
If you have a fixed length like: _AAA Where is a number from to 4000 and is your word, then simply split by the number? Then when you will instead each line, it will write to 4000 different regions, which can be hosted in 4000 different servers if you have that. And there

Re: Is there a problem with having 4000 tables in a cluster?

2013-09-24 Thread Varun Sharma
So you should salt the keys in the reduce phase but u donot salt the keys in HBase. That basically means that reducers do not see the keys in sorted order but they do see all the values for a specific key together. So the Hash essentially is a trick that stays within the mapreduce does not make

Re: Is there a problem with having 4000 tables in a cluster?

2013-09-24 Thread Jean-Marc Spaggiari
Who talked about salting in the reducer? Why do you want to do that? The usecase did not even talk about any reduce phase. Seems we need more details on what Jeremy want to achieve. JM 2013/9/24 Varun Sharma va...@pinterest.com So you should salt the keys in the reduce phase but u donot salt

Re: Loading hbase-site.xml settings from Hadoop MR job

2013-09-24 Thread Renato Marroquín Mogrovejo
Hi Dolan, 2013/9/23 Dolan Antenucci antenucc...@gmail.com Hi Renato, Can you clarify your recommendation? Sorry about this. I will try to be more helpful (: Currently I've added the directory where my hbase-site.xml file lives (/etc/hbase/conf/) to my Hadoop classpath (as described

Re: Is there a problem with having 4000 tables in a cluster?

2013-09-24 Thread jeremy p
Perhaps there has been some confusion. I'm concerned about hotspotting on read, not on write. So, for example, let's say it's time for me to process a 'document'. For the sake of this example, let's say the words are all 10 characters long. I spin up 200 mapreduce jobs, each one takes a 'line'

Export API using start and stop row key !

2013-09-24 Thread karunakar
Hi Experts, I would like to fetch data from hbase table using map reduce export API. I see that I can fetch data using start and stop time, but I don't see any information regarding start and stop row key. Can any expert guide me or give me an example in order fetch first 1000 rows (or start and

Re: Is there a problem with having 4000 tables in a cluster?

2013-09-24 Thread Michael Segel
Since different people use different terms... Salting is BAD. (You need to understand what is implied by the term salt.) What you really want to do is take the hash of the key, and then truncate the hash. Use that instead of a salt. Much better than a salt. Sent from a remote device. Please

FuzzyRowFilter missing some keys

2013-09-24 Thread Kiru Pakkirisamy
I now have string keys padded with spaces to a fixed size (40).  The FuzzyRowFilter is missing some keys. Any ideas, why this would happen ? If I do get a 'get' on the hbase shell, i can see the row.    Regards, - kiru

Re: Hbase Compression

2013-09-24 Thread aiyoh79
Hi Ted, Thks for the reply, now i understand the 3rd entry of that log file. Aiyoh79 Ted Yu-3 wrote memstoreSize of 128.2m was recorded at the beginning of HRegion#internalFlushcache(). After the flush, memstoreSize became 48.0m. Cheers On Tue, Sep 24, 2013 at 3:50 AM, aiyoh79 lt;

Re: Hbase Compression

2013-09-24 Thread aiyoh79
Hi J-D I am doing some benchmark using ycsb and the log entries retrieved were during the data loading stage. So i don't think there is and data deleted and also overwriting. Aiyoh79 Jean-Daniel Cryans wrote On flushing we do some cleanup, like removing deleted data that was already in the

Re: FuzzyRowFilter missing some keys

2013-09-24 Thread Ted Yu
Can you provide a bit more detail ? If you can reproduce this in a unit test, that would be easier to troubleshoot. Thanks On Sep 24, 2013, at 6:25 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: I now have string keys padded with spaces to a fixed size (40). The FuzzyRowFilter is

Re: Why is this region compacting?

2013-09-24 Thread Sergey Shelukhin
Meanwhile you can mitigate as specified above, by temporarily disabling expired file deletion. Please report if it doesn't work... On Tue, Sep 24, 2013 at 4:08 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Tom, Thanks for reporting this and for providing all this information.

importtsv tool can import data.lzo?

2013-09-24 Thread kun yan
Hi all version HBase 0.94.11 Can I use importtsv tool to import data file (data file is too LZO compressed, file.txt.lzo) from HDFS to HBase do? I opened the LZO compression algorithm in HBase -- In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can

Warning messages in hbase logs

2013-09-24 Thread Vimal Jain
Hi, I am seeing some warning message in Hregion log file for quite a few days. The message is : *WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:12994,call:multi(org.apache.hadoop.hbase.client.MultiAction@602cdaf7), rpc version=1, client version=29,

Re: Warning messages in hbase logs

2013-09-24 Thread Ted Yu
See http://hbase.apache.org/book.html#ops.slow.query On Tue, Sep 24, 2013 at 9:36 PM, Vimal Jain vkj...@gmail.com wrote: Hi, I am seeing some warning message in Hregion log file for quite a few days. The message is : *WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow):

Re: Warning messages in hbase logs

2013-09-24 Thread Kiru Pakkirisamy
Are you running many concurrent clients ? I had a similar problem when running on 0.94.x and I moved to 0.95.2 for this reason (see HBASE-9410)   Regards, - kiru From: Vimal Jain vkj...@gmail.com To: user@hbase.apache.org Sent: Tuesday, September 24, 2013 9:36

Re: Warning messages in hbase logs

2013-09-24 Thread Vimal Jain
Thanks Ted and Kiru, Ted, Any place where i can debug that JSON in detail ? Kiru , I have one multi-threaded client which reads/writes to hbase. On Wed, Sep 25, 2013 at 10:16 AM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: Are you running many concurrent clients ? I had a similar

Re: Is there a problem with having 4000 tables in a cluster?

2013-09-24 Thread Varun Sharma
Okay, thanks for the explanation. You can hash or salt (as many people say) the keys to avoid the hot spotting problem. What this means is that you push the part that issues filtered range queries to HBase into the reduce phase. The idea is this: 1) You get your query 'Pos_WORD' in mapper and

Re: Warning messages in hbase logs

2013-09-24 Thread Ted Yu
The slow response was from this method in HRegionServer: public R MultiResponse multi(MultiActionR multi) throws IOException { You can capture a few jstack's of the HRegionServer process and see what could be the cause. You can pastebin the stack traces. What HBase version are you using ?

Re: Is there a problem with having 4000 tables in a cluster?

2013-09-24 Thread Michael Webster
The biggest issue I see with so many tables is the region counts could get quite large. With 4000 tables, you will need at least that many regions, not even accounting for splitting the regions/growth. Forgive the speculation, but it almost sounds like you want an inverted index. Could you not