rowcount in a specified timerange

2014-04-29 Thread Hansi Klose
Hi, is it possible to count the rows in a table in a specified timerange? I found [--range=[startKey],[endKey]] but I need to count between specified timestamps. Regards Hansi

Re: HBase checksum vs HDFS checksum

2014-04-29 Thread Krishna Rao
Hi Ted, I had read those, but I'm confused about how this will affect non-HBase HDFS data. With HDFS checksumming off won't it affect data integrity? Krishna On 24 April 2014 15:54, Ted Yu yuzhih...@gmail.com wrote: Please take a look at the following:

Re: HBase checksum vs HDFS checksum

2014-04-29 Thread Anoop John
HBase using its own checksum handling doesn't directly affect HDFS. It will still maintain checksum info. The diff is at the read time.. HBase will open reader with checksum validation false and it will do checksum validation on its own. So using hbase handled checksum in a cluster should not

Re: rowcount in a specified timerange

2014-04-29 Thread Samir Ahmic
Hi, Hansi Take look at https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java . You can modified it and and scan.setTimeRange(long minStamp, long maxStamp) that will give you option to count rows in specific time range. Anther

Re: HBase checksum vs HDFS checksum

2014-04-29 Thread Krishna Rao
Thank you for your reply Anoop. However, the confusing is, unfortunately, still there because of the following (from herehttp://hbase.apache.org/book.html#perf.hdfs.configs.localread ): For optimal performance when short-circuit reads are enabled, it is recommended that HDFS checksums are

How to implement sorting in HBase scans for a particular column

2014-04-29 Thread Vikram Singh Chandel
Hi We have a requirement in which we have to get the scan result sorted on a particular column. eg. *Get Details of Authors sorted by their Publication Count. Limit :1000 * *Row Key is a MD5 hash of Author Id* Number of records 8.2 million rows for 3 year data.(sample dataset, actual data set

Re: How to implement sorting in HBase scans for a particular column

2014-04-29 Thread Ted Yu
Have you looked at Apache Phoenix ? Cheers On Apr 29, 2014, at 2:13 AM, Vikram Singh Chandel vikramsinghchan...@gmail.com wrote: Hi We have a requirement in which we have to get the scan result sorted on a particular column. eg. *Get Details of Authors sorted by their Publication

Re: How to implement sorting in HBase scans for a particular column

2014-04-29 Thread Vikram Singh Chandel
Yes we have looked, but way back in November December 2013 when it was having a lot of issue and because of which we decided not to use it. We built our solution design on Hbase alone. So we are looking for a better solution. Thanks On Tue, Apr 29, 2014 at 5:46 PM, Ted Yu yuzhih...@gmail.com

Re: HBase checksum vs HDFS checksum

2014-04-29 Thread Krishna Rao
Hi Ted, I had read those, but I'm confused about how this will affect non-HBase HDFS data. With HDFS checksumming off won't it affect data integrity? Krishna On 24 April 2014 15:54, Ted Yu yuzhih...@gmail.commailto:yuzhih...@gmail.com wrote: Please take a look at the following:

ZK issue on one single RS.

2014-04-29 Thread Jean-Marc Spaggiari
Hi, I face a strange issue today. I stopped my cluster to merge some regions. When I tried to restart it, one server got stuck. If I try to stop the cluster, this server ignore the request and stay stuck. Then if I restart, the cluster come back without this server saying that it's already

Re: ZK issue on one single RS.

2014-04-29 Thread Ted Yu
Can you give us versions of hbase and zookeeper ? bq. Connexion ré-initialisée par le correspondant Google translate says: Connection re-initialized by the corresponding Is the above correct ? On Tue, Apr 29, 2014 at 7:37 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi, I face

Re: ZK issue on one single RS.

2014-04-29 Thread Jean-Marc Spaggiari
HBase 0.94.19. I have the RC1 jars but I guess they are the same as the release. ZK 3.4.3 And for the translation, I will say something like Connection reset by peer? ;) Seems that ZK close the connection for this specific RS. I'm not really getting why... JM 2014-04-29 10:43 GMT-04:00 Ted Yu

Aw: Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on the hbase master server

2014-04-29 Thread Hansi Klose
Hi all, sorry for the late answer. I configured the hbase-site.conf like this property namedfs.client.socketcache.capacity/name value0/value /property property namedfs.datanode.socket.reuse.keepalive/name value0/value /property and restarted the hbase master and all

Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on the hbase master server

2014-04-29 Thread Stack
On Tue, Apr 29, 2014 at 8:15 AM, Hansi Klose hansi.kl...@web.de wrote: Hi all, sorry for the late answer. I configured the hbase-site.conf like this property namedfs.client.socketcache.capacity/name value0/value /property property

Re: How to implement sorting in HBase scans for a particular column

2014-04-29 Thread James Taylor
Hi Vikram, I see you sent the Phoenix mailing list back in Dec a question on how to use Phoenix 2.1.2 with Hadoop 2 for HBase 0.94. Looks like you were having trouble building Phoenix with the hadoop2 profile. In our 3.0/4.0 we bundle the phoenix jars pre-built with both hadoop1 and hadoop2, so

How to Create a Column in Hbase using Hbase Client API

2014-04-29 Thread Chamika Kasun
Hi all, Here what i want to do is i want to firstly need to create and a column family and then need to create a column for data to be added later. Here i can not use PUT command as it requires a value parameter(because idea of that application is first initiate the table and data will be added

Re: How to Create a Column in Hbase using Hbase Client API

2014-04-29 Thread Ted Yu
When create your table, you specify the column families in the table (code snippet from TestFromClientSide3.java): HTableDescriptor htd = newHTableDescriptor(hTable.getTableDescriptor()); HColumnDescriptor hcd = new HColumnDescriptor(htd.getFamily(FAMILY)); htd.addFamily(hcd);

Re: HBase checksum vs HDFS checksum

2014-04-29 Thread Stack
On Tue, Apr 29, 2014 at 1:54 AM, Krishna Rao krishnanj...@gmail.com wrote: Thank you for your reply Anoop. However, the confusing is, unfortunately, still there because of the following (from herehttp://hbase.apache.org/book.html#perf.hdfs.configs.localread ): For optimal performance when

Re: How to Create a Column in Hbase using Hbase Client API

2014-04-29 Thread Chamika Kasun
Yes this code is correct. Here what the code does is only create a column family. What i want to do is inside the column family i want to create several columns. In this documentation http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.htmlfor the Class HTableDescripter it

Re: How to Create a Column in Hbase using Hbase Client API

2014-04-29 Thread Matteo Bertozzi
HBase does not have the concept of static columns. You just have groups the column families and then each row has a set of qualifiers. so, you don't need to specify the set of columns (qualifiers) on initialization you just add/update rows with the specified column so you end up doing something

Re: HBase checksum vs HDFS checksum

2014-04-29 Thread Stack
On Tue, Apr 29, 2014 at 11:53 AM, Stack st...@duboce.net wrote: On Tue, Apr 29, 2014 at 1:54 AM, Krishna Rao krishnanj...@gmail.comwrote: Thank you for your reply Anoop. However, the confusing is, unfortunately, still there because of the following (from

Re: How to Create a Column in Hbase using Hbase Client API

2014-04-29 Thread Chamika Kasun
thank you for the immediate reply. (Y) On Wed, Apr 30, 2014 at 12:44 AM, Matteo Bertozzi theo.berto...@gmail.comwrote: HBase does not have the concept of static columns. You just have groups the column families and then each row has a set of qualifiers. so, you don't need to specify the

Re: RegionServer stuck in internalObtainRowLock forever - HBase 0.94.7

2014-04-29 Thread Asaf Mesika
We had this issue again in production. We had to shutdown the region server. Restart didn't help since this RS was bombarded with write requests and execute coprocessors requests, which made it open regions in the rate of 1 region in 2 minutes or so. Do you think its related to this jira:

Re: ZK issue on one single RS.

2014-04-29 Thread Stack
Anything in the zk log when the RS connects JMS? On Tue, Apr 29, 2014 at 7:37 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi, I face a strange issue today. I stopped my cluster to merge some regions. When I tried to restart it, one server got stuck. If I try to stop the

Re: ZK issue on one single RS.

2014-04-29 Thread Mikhail Antonov
1. 2014-04-29 10:32:28,876 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, will automatically reconnect when needed. What about ZK metrics, may be it's timing out or something (I mean echo srvr |nc

Need help with row and column design

2014-04-29 Thread Software Dev
Hey all. I have some questions regarding row key and column design. We want to calculate some metrics based on our page views broken down by minute, day, month and year. We also want this broken down country and have the ability to filter by some other attributes such as the sex of the user or

Help with row and column design

2014-04-29 Thread Software Dev
Hey all. I have some questions regarding row key and column design. We want to calculate some metrics based on our page views broken down by hour, day, month and year. We also want this broken down country and have the ability to filter by some other attributes such as the sex of the user or

Re: Need help with row and column design

2014-04-29 Thread Ted Yu
The initial row key design would result in hot spot (w.r.t. writes) Is user id part of row key ? Have you looked at Sematext's HBaseWD library ? Lastly, Apache Phoenix may fit your needs. On Tue, Apr 29, 2014 at 3:25 PM, Software Dev static.void@gmail.comwrote: Hey all. I have some

Re: Need help with row and column design

2014-04-29 Thread Software Dev
Sorry, I thought I deleted this message as it got cut off halfway when I was writing it. Can you please look at my other post? On Tue, Apr 29, 2014 at 3:36 PM, Ted Yu yuzhih...@gmail.com wrote: The initial row key design would result in hot spot (w.r.t. writes) Is user id part of row key ?

Re: Help with row and column design

2014-04-29 Thread Software Dev
Someone mentioned in another post about hotspotting. I guess I could reverse the row keys to prevent this? On Tue, Apr 29, 2014 at 3:34 PM, Software Dev static.void@gmail.com wrote: Hey all. I have some questions regarding row key and column design. We want to calculate some metrics based

RE: Help with row and column design

2014-04-29 Thread Rendon, Carlos (KBB)
I've created a similar system using a rowkey like: (hash of date) - date The downside is it still has a hotspot when inserting, but when reading a range of time it does not. My use case was geared towards speeding up lots of reads. Column qualifiers are just the collection of items you are

Re: Help with row and column design

2014-04-29 Thread Liam Slusser
Here is some links that helped me design my keys... http://www.appfirst.com/blog/best-practices-for-managing-hbase-in-a-high-write-environment/ http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/

Re: Help with row and column design

2014-04-29 Thread Software Dev
The downside is it still has a hotspot when inserting, but when reading a range of time it does not How can you do a scan query between dates when you hash the date? Column qualifiers are just the collection of items you are aggregating on. Values are increments. In your case qualifiers

RE: Help with row and column design

2014-04-29 Thread Rendon, Carlos (KBB)
You don't do a scan, you do a series of gets, which I believe you can batch into one call. last 5 days query in pseudocode res1 = Get( hash(2014-04-29) + 2014-04-29) res2 = Get( hash(2014-04-28) + 2014-04-28) res3 = Get( hash(2014-04-27) + 2014-04-27) res4 = Get( hash(2014-04-26) + 2014-04-26)

Re: Help with row and column design

2014-04-29 Thread Ted Yu
bq. I believe you can batch into one call. See the following API in HTable for batching Get's : public Result[] get(ListGet gets) throws IOException { On Tue, Apr 29, 2014 at 4:43 PM, Rendon, Carlos (KBB) cren...@kbb.comwrote: You don't do a scan, you do a series of gets, which I believe

Re: Help with row and column design

2014-04-29 Thread Software Dev
Yes. See total_usa vs. total_female_usa above. Basically you have to pre-store every level of aggregation you care about. Ok I think this makes sense. Gets a bit hairy when doing say a shitload of gets thought.. no? On Tue, Apr 29, 2014 at 4:43 PM, Rendon, Carlos (KBB) cren...@kbb.com wrote:

RE: Help with row and column design

2014-04-29 Thread Rendon, Carlos (KBB)
Gets a bit hairy when doing say a shitload of gets thought.. no? If you by hairy you mean the code is ugly, it was written for maximal clarity. I think you'll find a few sensible loops makes it fairly clean. Otherwise I'm not sure what you mean. -Original Message- From: Software Dev

Re: ZK issue on one single RS.

2014-04-29 Thread Jean-Marc Spaggiari
After some time the node finally exit by itself (Logs below) and I have been able to restart it correctly... The only things I found in the ZK logs was this: 2014-04-29 11:28:00,026 [myid:3] - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@209] - Too many connections from /

Java Client Write Data blocked

2014-04-29 Thread jingych
Hi, All! I need help! I run the java client to write 3million data into HBase, but when wrote almost 1 million, the process was blocked without any exception. Does anyone know the possible reason? So i can find the solution. Thanks All! By the way, the HBase Version is 0.94.6-cdh4.5.0!

Re: Java Client Write Data blocked

2014-04-29 Thread Jean-Marc Spaggiari
Any logs? Gargabe collection on the server side? Network issue? Swap? Please share your master and region servers logs so we can provide feedback. JM 2014-04-29 21:26 GMT-04:00 jingych jing...@neusoft.com: Hi, All! I need help! I run the java client to write 3million data into HBase,

Re: Re: Java Client Write Data blocked

2014-04-29 Thread jingych
Thanks JM. But the log is too big, How can I post the log file? The query from HBase is slower too. The Network is OK, I'm sure. Does GC have the log file ? And how to know the swap? Sorry, I'm rookie. jingych From: Jean-Marc Spaggiari Date: 2014-04-30 09:30 To: user; jingych Subject:

Re: Re: Java Client Write Data blocked

2014-04-29 Thread Jean-Marc Spaggiari
Look at when your data injection stopped, and look at the logs close to that date and see if there is exceptions of anything. You can paste part of those logs into pastebin.com and post the link here. You might want to run some Performance Evaluation tests against your cluster to see if it gives

Re: Re: Java Client Write Data blocked

2014-04-29 Thread jingych
I found the compaction action in the region server log file: 2014-04-29 16:23:25,373 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~41.8 M/43820608, currentsize=13.9 M/14607128 for region gspt_jbxx,,1398754604428.d6fd8d39289985adda9a3e048b92a24b. in 2325ms,

Re: Re: Java Client Write Data blocked

2014-04-29 Thread Jean-Marc Spaggiari
this is the piece of code related to your stack trace: // check periodically to see if a system stop is requested if (Store.closeCheckInterval 0) { bytesWritten += kv.getLength(); if (bytesWritten Store.closeCheckInterval) {

Re: Re: Java Client Write Data blocked

2014-04-29 Thread jingych
Thanks JM! Master log: 2014-04-29 16:28:21,973 INFO org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the region gspt_jbxx,,1398759815278.b94a0838f87658b577a0d94379a9e02d. that was online on cdh-datanode3,60020,1397641384262 2014-04-29 16:28:22,100 INFO

Re: Help with row and column design

2014-04-29 Thread Software Dev
Nothing against your code. I just meant that if we are doing a scan say for hourly metrics across a 6 month period we are talking about 4K+ gets. Is that something that can easily be handled? On Tue, Apr 29, 2014 at 5:08 PM, Rendon, Carlos (KBB) cren...@kbb.com wrote: Gets a bit hairy when doing

Re: Help with row and column design

2014-04-29 Thread Ted Yu
As I said this afternoon: See the following API in HTable for batching Get's : public Result[] get(ListGet gets) throws IOException { Cheers On Tue, Apr 29, 2014 at 7:45 PM, Software Dev static.void@gmail.comwrote: Nothing against your code. I just meant that if we are doing a scan

Re: Help with row and column design

2014-04-29 Thread Software Dev
Ok didnt know if the sheer number of gets would be a limiting factor. Thanks On Tue, Apr 29, 2014 at 7:57 PM, Ted Yu yuzhih...@gmail.com wrote: As I said this afternoon: See the following API in HTable for batching Get's : public Result[] get(ListGet gets) throws IOException { Cheers

Re: Help with row and column design

2014-04-29 Thread Sreepathi
I guess you can pre-split tables manually which avoids hotspotting.. On Tue, Apr 29, 2014 at 8:08 PM, Software Dev static.void@gmail.comwrote: Any improvements in the row key design? If i always know we will be querying by country could/should I prefix the row key with the country to