[
https://issues.apache.org/jira/browse/HBASE-15811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281060#comment-15281060
]
stack commented on HBASE-15811:
-------------------------------
Thanks [~rfarrjr]
This issue is a good one.
# A batch of Puts come in.
# We make it to HRegion#doMiniBatchMutation
# It adds the edits to WAL with append, then to memstore, then calls sync, and
then updates mvcc.
# Down in the sync, we add our sync request to the running sync threads.
# They send the sync and wait on return.
# It returns. We let blocked handlers go.
# They return to the client.
# Client comes back in to read its own writes.
TO BE CONFIRMED, it seems like the remote client and make a query IN BETWEEN
sync and update of mvcc.
I captures this in log:
{code}
7357 2016-05-11 16:19:51,511 INFO
[B.defaultRpcServer.handler=151,queue=151,port=16020] regionserver.HRegion:
mvcc.readPoint=638, a12e7c7829e37a16f4144b03e35e3532
7358 2016-05-11 16:19:51,512 INFO
[B.defaultRpcServer.handler=36,queue=36,port=16020] regionserver.HRegion: SPIN
EMPTY 637 test_farr,0,1463008764533.a12e7c7829e37a16f4144b03e 35e3532.
{code}
The first line is logging I added just after we'd updated the mvcc in
doMiniBatchMutation
The second line is the case where a Get got nothing back when though it had
just written the value. See how the readPoint at write is at 638 but the read
point for the Scan/Get is at 637... Somehow at creation of the Scan, it got a
readpoint before it was updated. Or there is something wrong w/ AtomicLong
(smile).
Let me see if I can artificially recreate.
> Batch Get after batch Put does not fetch all Cells
> --------------------------------------------------
>
> Key: HBASE-15811
> URL: https://issues.apache.org/jira/browse/HBASE-15811
> Project: HBase
> Issue Type: Bug
> Components: Client
> Affects Versions: 1.2.1
> Reporter: stack
> Assignee: stack
> Attachments: Test.java, Test2.java
>
>
> A big batch put followed by a batch get does not always return all Cells put.
> See attached test program by Robert Farr that reproduces the issue. It seems
> to be an issue to do with a cluster of more than one machine. Running against
> a single machine does not have the problem (though the single machine may
> have many regions). Robert was unable to make his program fail with a single
> machine only.
> I reproduced what Robert was seeing running his program. I was also unable to
> make a single machine fail. In a batch of 1000 puts, I see one to three Gets
> fail. I noticed too that if I wait a second after a fail and then re-get, the
> Get succeeds.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)