[ https://issues.apache.org/jira/browse/HBASE-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127887#comment-13127887 ]
Jonathan Hsieh commented on HBASE-4570: --------------------------------------- @Ted I have a strange situation where just with the fixes (first two patches, no instrumentation) I still get a lot of the failures in my test setup. However with extra instrumentation failure seem to go away (runs a long time without encountering problems). Note in my table setup, I have 10 cf's each with 2 cols so the instrumentation is written to always expect 20 KVs. I have two process -- one that does a filtered scan and twiddle, and another that just dues a filtered scan and count. I ran TestAcidGuarantees in a loop on the instrumented version. It eventually failed :( {code} Tests in error: testScanAtomicity(org.apache.hadoop.hbase.TestAcidGuarantees): Deferred testMixedAtomicity(org.apache.hadoop.hbase.TestAcidGuarantees): org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@54697123 closed {code} With the instrumented version TestAcidGuarentees still fails -- It took about 10th iterations before this happened. {code} Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 127.479 sec Tests run: 3, Failures: 0, Errors: 0, Skipped: 0 Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 121.662 sec Tests run: 3, Failures: 0, Errors: 0, Skipped: 0 Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 117.508 sec Tests run: 3, Failures: 0, Errors: 0, Skipped: 0 Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 124.208 sec Tests run: 3, Failures: 0, Errors: 0, Skipped: 0 Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 121.513 sec Tests run: 3, Failures: 0, Errors: 0, Skipped: 0 Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 120.472 sec Tests run: 3, Failures: 0, Errors: 0, Skipped: 0 Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 117.869 sec Tests run: 3, Failures: 0, Errors: 0, Skipped: 0 Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 120.435 sec Tests run: 3, Failures: 0, Errors: 0, Skipped: 0 Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 118.946 sec Tests run: 3, Failures: 0, Errors: 0, Skipped: 0 Tests run: 3, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 85.81 sec <<< FAILURE! Tests run: 3, Failures: 0, Errors: 2, Skipped: 0 {code} > Scan ACID problem with concurrent puts. > --------------------------------------- > > Key: HBASE-4570 > URL: https://issues.apache.org/jira/browse/HBASE-4570 > Project: HBase > Issue Type: Bug > Components: client, regionserver > Affects Versions: 0.90.1, 0.90.3 > Reporter: Jonathan Hsieh > Attachments: 4570-instrumentation.tgz, hbase-4570.tgz > > > When scanning a table sometimes rows that have multiple column families get > split into two rows if there are concurrent writes. In this particular case > we are overwriting the contents of a Get directly back onto itself as a Put. > For example, this is a two cf row (with "f1", "f2", .. "f9" cfs). It is > actually returned as two rows (#55 and #56). Interestingly if the two were > merged we would have a single proper row. > Row row0000024461 had time stamps: [55: > keyvalues={row0000024461/f0:data/1318200440867/Put/vlen=1000, > row0000024461/f0:qual/1318200440867/Put/vlen=10, > row0000024461/f1:data/1318200440867/Put/vlen=1000, > row0000024461/f1:qual/1318200440867/Put/vlen=10, > row0000024461/f2:data/1318200440867/Put/vlen=1000, > row0000024461/f2:qual/1318200440867/Put/vlen=10, > row0000024461/f3:data/1318200440867/Put/vlen=1000, > row0000024461/f3:qual/1318200440867/Put/vlen=10, > row0000024461/f4:data/1318200440867/Put/vlen=1000, > row0000024461/f4:qual/1318200440867/Put/vlen=10}, > 56: keyvalues={row0000024461/f5:data/1318200440867/Put/vlen=1000, > row0000024461/f5:qual/1318200440867/Put/vlen=10, > row0000024461/f6:data/1318200440867/Put/vlen=1000, > row0000024461/f6:qual/1318200440867/Put/vlen=10, > row0000024461/f7:data/1318200440867/Put/vlen=1000, > row0000024461/f7:qual/1318200440867/Put/vlen=10, > row0000024461/f8:data/1318200440867/Put/vlen=1000, > row0000024461/f8:qual/1318200440867/Put/vlen=10, > row0000024461/f9:data/1318200440867/Put/vlen=1000, > row0000024461/f9:qual/1318200440867/Put/vlen=10}] > I've only tested this on 0.90.1+patches and 0.90.3+patches, but it is > consistent and duplicatable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira