Re: Review Request: Increment multiple columns in a row at once

Jonathan Gray Mon, 25 Oct 2010 15:02:25 -0700


> On 2010-10-24 20:05:55, Ryan Rawson wrote:
> > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java, 
> > line 377
> > <http://review.cloudera.org/r/1088/diff/2/?file=15909#file15909line377>
> >
> >     I'm not sure how this code is optional for your new 'upsert', here are 
> > some use cases that I found painful:
> >     
> >     - when the KV is in snapshot we can create new KVs in memstore with the 
> > same TS.  This means you have a dup and before we had this new 'dup' code 
> > it ruined counts badly.
> >     - the Math.max() bit it to ensure the new KV isnt being created in the 
> > past a bit and accidently duplicating  a timestamp inside the snapshot.
> 
> Jonathan Gray wrote:
>     what do you mean by optional?  there shouldn't be any real difference.  
> this code is basically the exact same code that was there but now pulled into 
> a method that can be reused.
> 
> Ryan Rawson wrote:
>     The original updateColumnValue code was structured like this:
>     
>     - look in snapshot
>     - loop over kvset
>     
>     Create new KV, insert it
>     
>     - loop over kvset
>     
>     But you only included the 2nd half.  Your new 'upsert' only includes half 
> the implementation of updateColumnValue. 
>     
>     This comes up in this scenario:
>     
>     - increment at TS=100
>     - snapshot, now we have kvset={} and a KV, TS=100 in snapshot
>     - increment at TS=100
>     
>     Without the code i have above we'll end up with 2 KVs, both of TS=100, 
> one in kvset, one in snapshot, which then becomes a HFile. 
>     
>     When future reads happen, if they don't see the 'correct' KV, you will 
> end up with lost updates.  
>     
>     In theory the 'read memstore first' and 'order duplicate TS KVs' patches 
> should resolve this.  Lost updates reallllllly suck though.  I'd probably 
> recommend the conservative course, because I already went thru this and it 
> sucks.  If you decide that you are willing to risk it, I might recommend this 
> particular test case:
>     
>     - increment some data at TS=X
>     - snapshot at TS=X
>     - increment some data at TS=X, now you have 2 * KV, TS=X
>     - flush to hfile
>     - increment at TS=X, ensure you dont lose any updates
>     - snapshot/flush with TS=X in kvset->snapshot->hfile
>     - compact the 2 hfiles both with TS=X in them
>     - do read and make sure the value is what you expect.
>     
>     By using the manual environment edge you can control the 
> currentTimeMillis() returned inside HRegion and down.
>     
>     This covers the worst case scenario I think, since normally one would get 
> increments and you shouldn't get TS=X * 2 in a single HFile.


Yeah, we should now be handling these duplicate version things correctly.  I 
guess you're saying you don't necessarily trust it.

I agree that a good test would be best.  I don't have time to write that now 
unfortunately.

What's there now seems kind of hacky... if it finds something in snapshot, it 
adds 1 to the timestamp and puts that in memstore.  but looking at your example 
above, wouldn't what's the existing behavior break?

When you increment at TS=X and there is TS=X in the snapshot, it then inserts 
it into MemStore as TS=X+1.  Then if you insert to MemStore again at TS=X.  It 
won't be in snapshot (it's been flushed), so then when you look for the latest 
value / go to delete old ones, you do memstore.tailSet(TS=X).  Right?  I could 
be missing something.


> On 2010-10-24 20:05:55, Ryan Rawson wrote:
> > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java, 
> > line 495
> > <http://review.cloudera.org/r/1088/diff/2/?file=15909#file15909line495>
> >
> >     yeah you cant compare against memstoreTS because if you have this in 
> > here you wont be able to ever increment values that were inserted into the 
> > future. you'd just leave them there and continually see it in the 'get' 
> > part and then in this code bit leave it in place and create a new KV that 
> > is masked by the future KV.
> >     
> >     It won't be possible for that insert to be part of an uncommitted 
> > change because of the rowlock however. So no atomic-rules will have been 
> > broken.
> 
> Jonathan Gray wrote:
>     i'm not sure i follow.  am i doing it wrong?  this should be exactly the 
> same as what's there.
>     
>     also, we are breaking the atomic-rules across the Increment.  Each column 
> is not broken but across them it is (for reads not writes).
>     
>     It seems like we could use RWCC though for increments.  I think it's fine 
> that if you're using increment you can never write data into it w/ another 
> API (or really, with manual timestamps).
> 
> Ryan Rawson wrote:
>     there is no easy way to include RWCC right now, because you have to be 
> careful about 'committing' the new values before removing the old ones.  This 
> is a bit tricky because normally the RWCC 'commit' happens at the HRegion 
> level, and right now we are inside memstore/store.  Hopefully people won't 
> need atomic reads of multi incremented values.  If one is doing counters this 
> would be the case.  Javadoc should definitely yell about this.

Added more comments into the new methods javadoc.


> On 2010-10-24 20:05:55, Ryan Rawson wrote:
> > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java, 
> > line 466
> > <http://review.cloudera.org/r/1088/diff/2/?file=15909#file15909line466>
> >
> >     im not sure i like the name upsert, it is too rdbms-y for me.
> >     
> >     I need to poke more at this, but i prever the matchingRow() call, it 
> > encapsulates the getRowOffset junk, which leaks wayyy too much all over the 
> > place.
> >     
> >
> 
> Jonathan Gray wrote:
>     Sure, could use those calls.  Not even sure why I changed this, wrote 
> that part of this patch a few weeks ago.
>     
>     And it's an "update if it exists, insert if not" operation which I think 
> of as an upsert operation.  Open to whatever tho.

Okay, changed those to matchingRow(), etc...


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1088/#review1641
-----------------------------------------------------------


On 2010-10-24 19:29:07, Jonathan Gray wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/1088/
> -----------------------------------------------------------
> 
> (Updated 2010-10-24 19:29:07)
> 
> 
> Review request for hbase, stack and khemani.
> 
> 
> Summary
> -------
> 
> Adds a new Increment class that allows multiple columns (each w/ own 
> increment amount) in a single row being incremented in one call.
> 
> The big wins here are being able to do multiple columns in a row in a single 
> RPC and having it be appended/synced to the WAL in a single call.
> 
> The current trade-off is that you lose atomicity to readers (ie. this does 
> not currently use RWCC).  Eventually it could but for the current use case I 
> am building this for, it's okay like this.
> 
> 
> This addresses bug HBASE-2946.
>     http://issues.apache.org/jira/browse/HBASE-2946
> 
> 
> Diffs
> -----
> 
>   trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1026930 
>   trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 
> 1026930 
>   trunk/src/main/java/org/apache/hadoop/hbase/client/Increment.java 
> PRE-CREATION 
>   trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 
> 1026930 
>   trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 
> 1026930 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 
> 1026930 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
> 1026930 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 
> 1026930 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1026930 
>   trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java 
> 1026930 
>   trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 
> 1026930 
> 
> Diff: http://review.cloudera.org/r/1088/diff
> 
> 
> Testing
> -------
> 
> Added TestFromClientSide.testIncrement() which adds some client-side tests of 
> Increment (and mixing w/ original icv call).  That passes and most the way 
> through a test suite run.
> 
> 
> Thanks,
> 
> Jonathan
> 
>

Re: Review Request: Increment multiple columns in a row at once

Reply via email to