> On 2010-10-24 20:05:55, Ryan Rawson wrote: > > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java, > > line 377 > > <http://review.cloudera.org/r/1088/diff/2/?file=15909#file15909line377> > > > > I'm not sure how this code is optional for your new 'upsert', here are > > some use cases that I found painful: > > > > - when the KV is in snapshot we can create new KVs in memstore with the > > same TS. This means you have a dup and before we had this new 'dup' code > > it ruined counts badly. > > - the Math.max() bit it to ensure the new KV isnt being created in the > > past a bit and accidently duplicating a timestamp inside the snapshot. > > Jonathan Gray wrote: > what do you mean by optional? there shouldn't be any real difference. > this code is basically the exact same code that was there but now pulled into > a method that can be reused. > > Ryan Rawson wrote: > The original updateColumnValue code was structured like this: > > - look in snapshot > - loop over kvset > > Create new KV, insert it > > - loop over kvset > > But you only included the 2nd half. Your new 'upsert' only includes half > the implementation of updateColumnValue. > > This comes up in this scenario: > > - increment at TS=100 > - snapshot, now we have kvset={} and a KV, TS=100 in snapshot > - increment at TS=100 > > Without the code i have above we'll end up with 2 KVs, both of TS=100, > one in kvset, one in snapshot, which then becomes a HFile. > > When future reads happen, if they don't see the 'correct' KV, you will > end up with lost updates. > > In theory the 'read memstore first' and 'order duplicate TS KVs' patches > should resolve this. Lost updates reallllllly suck though. I'd probably > recommend the conservative course, because I already went thru this and it > sucks. If you decide that you are willing to risk it, I might recommend this > particular test case: > > - increment some data at TS=X > - snapshot at TS=X > - increment some data at TS=X, now you have 2 * KV, TS=X > - flush to hfile > - increment at TS=X, ensure you dont lose any updates > - snapshot/flush with TS=X in kvset->snapshot->hfile > - compact the 2 hfiles both with TS=X in them > - do read and make sure the value is what you expect. > > By using the manual environment edge you can control the > currentTimeMillis() returned inside HRegion and down. > > This covers the worst case scenario I think, since normally one would get > increments and you shouldn't get TS=X * 2 in a single HFile.
Yeah, we should now be handling these duplicate version things correctly. I guess you're saying you don't necessarily trust it. I agree that a good test would be best. I don't have time to write that now unfortunately. What's there now seems kind of hacky... if it finds something in snapshot, it adds 1 to the timestamp and puts that in memstore. but looking at your example above, wouldn't what's the existing behavior break? When you increment at TS=X and there is TS=X in the snapshot, it then inserts it into MemStore as TS=X+1. Then if you insert to MemStore again at TS=X. It won't be in snapshot (it's been flushed), so then when you look for the latest value / go to delete old ones, you do memstore.tailSet(TS=X). Right? I could be missing something. > On 2010-10-24 20:05:55, Ryan Rawson wrote: > > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java, > > line 495 > > <http://review.cloudera.org/r/1088/diff/2/?file=15909#file15909line495> > > > > yeah you cant compare against memstoreTS because if you have this in > > here you wont be able to ever increment values that were inserted into the > > future. you'd just leave them there and continually see it in the 'get' > > part and then in this code bit leave it in place and create a new KV that > > is masked by the future KV. > > > > It won't be possible for that insert to be part of an uncommitted > > change because of the rowlock however. So no atomic-rules will have been > > broken. > > Jonathan Gray wrote: > i'm not sure i follow. am i doing it wrong? this should be exactly the > same as what's there. > > also, we are breaking the atomic-rules across the Increment. Each column > is not broken but across them it is (for reads not writes). > > It seems like we could use RWCC though for increments. I think it's fine > that if you're using increment you can never write data into it w/ another > API (or really, with manual timestamps). > > Ryan Rawson wrote: > there is no easy way to include RWCC right now, because you have to be > careful about 'committing' the new values before removing the old ones. This > is a bit tricky because normally the RWCC 'commit' happens at the HRegion > level, and right now we are inside memstore/store. Hopefully people won't > need atomic reads of multi incremented values. If one is doing counters this > would be the case. Javadoc should definitely yell about this. Added more comments into the new methods javadoc. > On 2010-10-24 20:05:55, Ryan Rawson wrote: > > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java, > > line 466 > > <http://review.cloudera.org/r/1088/diff/2/?file=15909#file15909line466> > > > > im not sure i like the name upsert, it is too rdbms-y for me. > > > > I need to poke more at this, but i prever the matchingRow() call, it > > encapsulates the getRowOffset junk, which leaks wayyy too much all over the > > place. > > > > > > Jonathan Gray wrote: > Sure, could use those calls. Not even sure why I changed this, wrote > that part of this patch a few weeks ago. > > And it's an "update if it exists, insert if not" operation which I think > of as an upsert operation. Open to whatever tho. Okay, changed those to matchingRow(), etc... - Jonathan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1088/#review1641 ----------------------------------------------------------- On 2010-10-24 19:29:07, Jonathan Gray wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://review.cloudera.org/r/1088/ > ----------------------------------------------------------- > > (Updated 2010-10-24 19:29:07) > > > Review request for hbase, stack and khemani. > > > Summary > ------- > > Adds a new Increment class that allows multiple columns (each w/ own > increment amount) in a single row being incremented in one call. > > The big wins here are being able to do multiple columns in a row in a single > RPC and having it be appended/synced to the WAL in a single call. > > The current trade-off is that you lose atomicity to readers (ie. this does > not currently use RWCC). Eventually it could but for the current use case I > am building this for, it's okay like this. > > > This addresses bug HBASE-2946. > http://issues.apache.org/jira/browse/HBASE-2946 > > > Diffs > ----- > > trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1026930 > trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java > 1026930 > trunk/src/main/java/org/apache/hadoop/hbase/client/Increment.java > PRE-CREATION > trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java > 1026930 > trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java > 1026930 > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java > 1026930 > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java > 1026930 > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java > 1026930 > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1026930 > trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java > 1026930 > trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java > 1026930 > > Diff: http://review.cloudera.org/r/1088/diff > > > Testing > ------- > > Added TestFromClientSide.testIncrement() which adds some client-side tests of > Increment (and mixing w/ original icv call). That passes and most the way > through a test suite run. > > > Thanks, > > Jonathan > >
