[
https://issues.apache.org/jira/browse/HBASE-16132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359493#comment-15359493
]
Devaraj Das commented on HBASE-16132:
-------------------------------------
So if you look at the RpcRetryingCallerWithReadReplicas.call() implementation,
it first does a poll (to wait for a certain timeout) -
{code}
try {
// wait for the timeout to see whether the primary responds back
Future<Result> f = cs.poll(timeBeforeReplicas, TimeUnit.MICROSECONDS);
// Yes, microseconds
if (f != null) {
return f.get(); //great we got a response
}
}
{code}
After that, it does a take() / get()
{code}
try {
try {
Future<Result> f = cs.take();
return f.get();
} catch (ExecutionException e) {
throwEnrichedException(e, retries);
}
} catch (CancellationException e) {
{code}
In the ScannerCallableWithReplicas.call(), it does poll in both places. But
after the second poll(), it might be better to do a get(). That should take
care of throwing the exception (look at the implementation of get()). On a
related note, should the second call to poll() be replaced with a call to
take(). There is a difference between the poll() and take(). Haven't analyzed
the side effects of doing that...
I am okay with your patch but wanted to bring the above up and see if it makes
sense..
> Scan does not return all the result when regionserver is busy
> -------------------------------------------------------------
>
> Key: HBASE-16132
> URL: https://issues.apache.org/jira/browse/HBASE-16132
> Project: HBase
> Issue Type: Bug
> Reporter: binlijin
> Assignee: binlijin
> Attachments: HBASE-16132.patch, HBASE-16132_v2.patch,
> HBASE-16132_v3.patch, HBASE-16132_v3.patch, TestScanMissingData.java
>
>
> We have find some corner case, when regionserver is busy and last a long
> time. Some scanner may return null even if they do not scan all data.
> We find in ScannerCallableWithReplicas there is a case do not handler
> correct, when cs.poll timeout and do not return any result , it is will
> return a null result, so scan get null result, and end the scan.
> {code}
> try {
> Future<Pair<Result[], ScannerCallable>> f = cs.poll(timeout,
> TimeUnit.MILLISECONDS);
> if (f != null) {
> Pair<Result[], ScannerCallable> r = f.get(timeout,
> TimeUnit.MILLISECONDS);
> if (r != null && r.getSecond() != null) {
> updateCurrentlyServingReplica(r.getSecond(), r.getFirst(), done,
> pool);
> }
> return r == null ? null : r.getFirst(); // great we got an answer
> }
> } catch (ExecutionException e) {
> RpcRetryingCallerWithReadReplicas.throwEnrichedException(e, retries);
> } catch (CancellationException e) {
> throw new InterruptedIOException(e.getMessage());
> } catch (InterruptedException e) {
> throw new InterruptedIOException(e.getMessage());
> } catch (TimeoutException e) {
> throw new InterruptedIOException(e.getMessage());
> } finally {
> // We get there because we were interrupted or because one or more of
> the
> // calls succeeded or failed. In all case, we stop all our tasks.
> cs.cancelAll();
> }
> return null; // unreachable
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)