[ https://issues.apache.org/jira/browse/HBASE-15811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281060#comment-15281060 ]
stack commented on HBASE-15811: ------------------------------- Thanks [~rfarrjr] This issue is a good one. # A batch of Puts come in. # We make it to HRegion#doMiniBatchMutation # It adds the edits to WAL with append, then to memstore, then calls sync, and then updates mvcc. # Down in the sync, we add our sync request to the running sync threads. # They send the sync and wait on return. # It returns. We let blocked handlers go. # They return to the client. # Client comes back in to read its own writes. TO BE CONFIRMED, it seems like the remote client and make a query IN BETWEEN sync and update of mvcc. I captures this in log: {code} 7357 2016-05-11 16:19:51,511 INFO [B.defaultRpcServer.handler=151,queue=151,port=16020] regionserver.HRegion: mvcc.readPoint=638, a12e7c7829e37a16f4144b03e35e3532 7358 2016-05-11 16:19:51,512 INFO [B.defaultRpcServer.handler=36,queue=36,port=16020] regionserver.HRegion: SPIN EMPTY 637 test_farr,0,1463008764533.a12e7c7829e37a16f4144b03e 35e3532. {code} The first line is logging I added just after we'd updated the mvcc in doMiniBatchMutation The second line is the case where a Get got nothing back when though it had just written the value. See how the readPoint at write is at 638 but the read point for the Scan/Get is at 637... Somehow at creation of the Scan, it got a readpoint before it was updated. Or there is something wrong w/ AtomicLong (smile). Let me see if I can artificially recreate. > Batch Get after batch Put does not fetch all Cells > -------------------------------------------------- > > Key: HBASE-15811 > URL: https://issues.apache.org/jira/browse/HBASE-15811 > Project: HBase > Issue Type: Bug > Components: Client > Affects Versions: 1.2.1 > Reporter: stack > Assignee: stack > Attachments: Test.java, Test2.java > > > A big batch put followed by a batch get does not always return all Cells put. > See attached test program by Robert Farr that reproduces the issue. It seems > to be an issue to do with a cluster of more than one machine. Running against > a single machine does not have the problem (though the single machine may > have many regions). Robert was unable to make his program fail with a single > machine only. > I reproduced what Robert was seeing running his program. I was also unable to > make a single machine fail. In a batch of 1000 puts, I see one to three Gets > fail. I noticed too that if I wait a second after a fail and then re-get, the > Get succeeds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)