Re: Using HBase for Deduping

2013-02-15 Thread Asaf Mesika
Michael, this means read for every write? On Friday, February 15, 2013, Michael Segel wrote: What constitutes a duplicate? An over simplification is to do a HTable.checkAndPut() where you do the put if the column doesn't exist. Then if the row is inserted (TRUE) return value, you push the

Re: hbase-master-server slept

2013-02-15 Thread So Hibino
Hi, We consider to update the version of Hbase. The VM spec is below. CPU:2 Core MEMORY:4GB We don't know the hardware spec of host server, because we owe the VM from a VPS provider. Additionally, I check vmstat for the time spanning the isuue. No batch and online job worked on this server

Re: Using HBase for Deduping

2013-02-15 Thread Asaf Mesika
Then maybe he can place an event in the same rowkey but with a column qualifier which the time stamp of the event saved as long. Upon preCompact in a region observer he can filter out for any row all column but the first? On Friday, February 15, 2013, Anoop Sam John wrote: When max versions set

RE: Using HBase for Deduping

2013-02-15 Thread Anoop Sam John
Or may be go with large value for max version and put the duplicate entry. Now in the compact, need to have a wrapper for InternalScanner and next() method return only the 1st KV out, removing the others... Even while scan also same kind of logic will be needed.. This will be good enough IMO

Re: Using HBase for Deduping

2013-02-15 Thread Michael Segel
On Feb 15, 2013, at 3:07 AM, Asaf Mesika asaf.mes...@gmail.com wrote: Michael, this means read for every write? Yes and no. At the macro level, a read for every write would mean that your client would read a record from HBase, and then based on some logic it would either write a record,

Re: Using HBase for Deduping

2013-02-15 Thread Michael Segel
But then he can't trigger an event if its a net new row. Methinks that he needs to better define the problem he is trying to solve. Also the number of events. A billion an hour or 300K events a second? (Ok its 277.78K events a second.) On Feb 14, 2013, at 10:19 PM, Anoop Sam John

Re: question about pre-splitting regions

2013-02-15 Thread Doug Meil
Good to hear! Given your experience, I'd appreciate your feedback on the section 6.3.6. Relationship Between RowKeys and Region Splits in... http://hbase.apache.org/book.html#schema.creation Š because it's on that same topic. Any other points to add to this? Thanks! On 2/14/13 11:08 PM,

Re: Using HBase for Deduping

2013-02-15 Thread Rahul Ravindran
I had tried checkAndPut yesterday with a null passed as the value and it had thrown an exception when the row did not exist. Perhaps, I was doing something wrong. Will try that again, since, yes, I would prefer a checkAndPut(). From: Michael Segel

Re: Using HBase for Deduping

2013-02-15 Thread Michael Segel
Interesting. Surround with a Try Catch? But it sounds like you're on the right path. Happy Coding! On Feb 15, 2013, at 11:12 AM, Rahul Ravindran rahu...@yahoo.com wrote: I had tried checkAndPut yesterday with a null passed as the value and it had thrown an exception when the row did not

queries and MR jobs

2013-02-15 Thread Pamecha, Abhishek
Hi Is there a way to partition HDFS [replication factor, say 3]] or route requests to specific RS nodes so that One set of nodes serve operations like put and get etc. Other set of nodes do MR on the same replicated data set And those two sets don't share the same nodes? I mean, If we are

Re: storing lists in columns

2013-02-15 Thread Jean-Marc Spaggiari
Hi Stas, Few options are coming into my mind. Quickly: 1) Why not storing the products in specif columns instead of in the same one? Like: table, rowid1, cf:list, c:aa, value:true table, rowid1, cf:list, c:bb, value:true table, rowid1, cf:list, c:cc, value:true table, rowid2, cf:list, c:aabb,

Re: debugging responseTooSlow

2013-02-15 Thread Ted Yu
The slow response took about 1.5 minutes. During this period, did you observe high latency ? If you have Ganglia installed on master / NN node, do you observe abnormal spike ? BTW did you presplit your table ? Thanks On Fri, Feb 15, 2013 at 7:14 PM, Viral Bajaria viral.baja...@gmail.comwrote:

Re: debugging responseTooSlow

2013-02-15 Thread Kevin O'dell
If you take a look at sar from 2013-02-16 on 10.149.10.10http://10.149.10.10:41017/ do you see any major I/O wait, swapping, or anything out of the norm? Is this occurring on all three region servers? When the perf test is running can you verify you are writing to all three nodes? On Fri, Feb

[ANNOUNCE] HBase 0.94.5 is available for download

2013-02-15 Thread lars hofhansl
The HBase Team is pleased to announce the release of HBase 0.94.5. Download it from your favorite Apache mirror [1]. HBase 0.94.5 is a bug fix release and has 76 issues resolved against it. 0.94.5 is the current stable release of HBase. All previous 0.92.x and 0.94.x releases can upgraded to

Re: debugging responseTooSlow

2013-02-15 Thread Ted Yu
Viral: Did you use YCSB or LoadTestTool ? Was the load spread relatively evenly across your servers ? Thanks On Fri, Feb 15, 2013 at 9:19 PM, Viral Bajaria viral.baja...@gmail.comwrote: Yeah I noticed very high latency around the time of slow response, basically my client timed out for those