[
https://issues.apache.org/jira/browse/HBASE-15811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282077#comment-15282077
]
stack commented on HBASE-15811:
-------------------------------
This seems to be the issue (since 0.99 and 0.98.4)
{code}
commit ab72babd97838317fa0a380fc4d49bf2703ad17c
Author: Nicolas Liochon <[email protected]>
Date: Tue Jun 24 11:37:02 2014 +0200
HBASE-11403 Fix race conditions around Object#notify
diff --git
a/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java
b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java
index 42c1546..7b153ec 100644
---
a/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java
+++
b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java
...
@@ -979,6 +980,7 @@ class AsyncProcess {
oldInProgress = currentInProgress;
try {
synchronized (this.tasksInProgress) {
+ if (tasksInProgress.get() != oldInProgress) break;
this.tasksInProgress.wait(100);
}
} catch (InterruptedException e) {
{code}
We have a supposed wait till done but the above change broke it in
AsyncProcess.java. We break out of the while loop completely when we are
supposed to wait until there are no more tasks in flight -- tasks (the max
param passed in) is zero.
> Batch Get after batch Put does not fetch all Cells
> --------------------------------------------------
>
> Key: HBASE-15811
> URL: https://issues.apache.org/jira/browse/HBASE-15811
> Project: HBase
> Issue Type: Bug
> Components: Client
> Affects Versions: 1.3.0, 1.2.1
> Reporter: stack
> Assignee: stack
> Priority: Blocker
> Fix For: 1.3.0, 1.2.1
>
> Attachments: Test.java, Test2.java
>
>
> A big batch put followed by a batch get does not always return all Cells put.
> See attached test program by Robert Farr that reproduces the issue. It seems
> to be an issue to do with a cluster of more than one machine. Running against
> a single machine does not have the problem (though the single machine may
> have many regions). Robert was unable to make his program fail with a single
> machine only.
> I reproduced what Robert was seeing running his program. I was also unable to
> make a single machine fail. In a batch of 1000 puts, I see one to three Gets
> fail. I noticed too that if I wait a second after a fail and then re-get, the
> Get succeeds.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)