[ https://issues.apache.org/jira/browse/HBASE-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805304#action_12805304 ]
stack commented on HBASE-1485: ------------------------------ >From the list: {code} On Tue, Jan 26, 2010 at 9:36 AM, Rod Cope <rod.cope at openlogic dot com> wrote: > Hi, > > I¹m seeing behavior on 0.20.2 and 0.20.3 that doesn¹t seem quite right and > would like to know if this is by design, a bug, or something I¹m doing > wrong. > > Background: > > When I do a put that includes a timestamp like this (conceptually I know > this is not the actual API), it works just fine. > put ³table², ³family², ³column², ³bbb², 12345 > > Then, if I do another put in the same client code using the same timestamp > like this... > put ³table², ³family², ³column², ³aaa², 12345 > > ...and I create a scanner, grab a Result, and iterate over all values using > list(), I get this... > ³table², ³family², ³column², ³aaa², 12345 > > So far, so good. Now, if I truncate the table from the shell and run a new > program that does a flush() on the table between the two put¹s, but does it > in the same client program back-to-back, I also get the same results from > list(). > > ----- > > Problem: > > Here¹s where the trouble starts. I truncate the table and run a new program > that puts ³bbb², flushes the table, and quits. Here¹s what I get from > list(): > ³table², ³family², ³column², ³bbb², 12345 > > Then I run another program that puts ³aaa², flushes, and quits. Here¹s what > I get from list(): > ³table², ³family², ³column², ³aaa², 12345 > ³table², ³family², ³column², ³bbb², 12345 > > And if I then run a third program that puts ³ccc², flushes, and quits, I get > this from list(): > ³table², ³family², ³column², ³ccc², 12345 > ³table², ³family², ³column², ³bbb², 12345 > ³table², ³family², ³column², ³aaa², 12345 > > I¹m getting three different values for identical > table/family/qualifier/timestamp tuples. Does this seem right? There also > doesn¹t seem to be a defined sort order, probably because the timestamps are > identical. > > Also, if instead of using list(), I use getMap(), then I always only get a > single result. The single result is always the last item in the lists above > (i.e., ³bbb² then ³bbb² then ³aaa²). I get identical results from using > getNoVersionMap(). > > I suspect that this same behavior could occur when HBase decides to flush on > its own, but I could be wrong. As you can imagine, this can cause problems > because clients can¹t know from the results of calling list() which value is > ³right² or ³newest². They also can¹t rely on getMap() or getNoVersionMap() > because the single result that gets returned is not necessarily ³right² or > ³newest². > > I¹ve reproduced everything above in a stand-alone installation and also with > a 7 regionserver cluster with the final 0.20.3. I started down this > debugging path originally because I ran into this problem on the 7 > regionserver cluster with one table of 100+ regions. I was flushing > programmatically at the end of some large imports because I'm doing > setWriteToWAL(false) for load performance. > > Am I doing something wrong? Did I miss an HBase assumption about flushing > and/or identical timestamps? > > Any help would be much appreciated. {code} > Wrong or indeterminate behavior when there are duplicate versions of a column > ----------------------------------------------------------------------------- > > Key: HBASE-1485 > URL: https://issues.apache.org/jira/browse/HBASE-1485 > Project: Hadoop HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.20.0 > Reporter: Jonathan Gray > Fix For: 0.21.0 > > > As of now, both gets and scanners will end up returning all duplicate > versions of a column. The ordering of them is indeterminate. > We need to decide what the desired/expected behavior should be and make it > happen. > Note: It's nearly impossible for this to work with Gets as they are now > implemented in 1304 so this is really a Scanner issue. To implement this > correctly with Gets, we would have to undo basically all the optimizations > that Gets do and making them far slower than a Scanner. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.