[ 
https://issues.apache.org/jira/browse/HBASE-16132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359972#comment-15359972
 ] 

Yu Li commented on HBASE-16132:
-------------------------------

Thanks for further clarification, got your point now.

So in both ScannerCallableWithReplicas and RpcRetryingCallerWithReadReplicas it 
calls {{Future.get}}, and the main difference is that 
RpcRetryingCallerWithReadReplicas calls {{cs.take}} instead of {{cs.poll}} for 
the second replica, which means we will dead-wait on the second one if the 
first replica timed out. Since RpcRetryingCallerWithReadReplicas is used by get 
and get is a special type of scan, I agree that it's better to follow the same 
way in ScannerCallableWithReplicas.

Let me push this patch in first (to solve the problem) and open another JIRA 
about your proposal sir (a new JIRA will be more visible so we could better see 
others' thoughts :-)). Thanks for point this out [~devaraj]

> Scan does not return all the result when regionserver is busy
> -------------------------------------------------------------
>
>                 Key: HBASE-16132
>                 URL: https://issues.apache.org/jira/browse/HBASE-16132
>             Project: HBase
>          Issue Type: Bug
>            Reporter: binlijin
>            Assignee: binlijin
>         Attachments: HBASE-16132.patch, HBASE-16132_v2.patch, 
> HBASE-16132_v3.patch, HBASE-16132_v3.patch, TestScanMissingData.java
>
>
> We have find some corner case, when regionserver is busy and last a long 
> time. Some scanner may return null even if they do not scan all data.
> We find in ScannerCallableWithReplicas there is a case do not handler 
> correct, when cs.poll timeout and do not return any result , it is will 
> return a null result, so scan get null result, and end the scan. 
>  {code}
>     try {
>       Future<Pair<Result[], ScannerCallable>> f = cs.poll(timeout, 
> TimeUnit.MILLISECONDS);
>       if (f != null) {
>         Pair<Result[], ScannerCallable> r = f.get(timeout, 
> TimeUnit.MILLISECONDS);
>         if (r != null && r.getSecond() != null) {
>           updateCurrentlyServingReplica(r.getSecond(), r.getFirst(), done, 
> pool);
>         }
>         return r == null ? null : r.getFirst(); // great we got an answer
>       }
>     } catch (ExecutionException e) {
>       RpcRetryingCallerWithReadReplicas.throwEnrichedException(e, retries);
>     } catch (CancellationException e) {
>       throw new InterruptedIOException(e.getMessage());
>     } catch (InterruptedException e) {
>       throw new InterruptedIOException(e.getMessage());
>     } catch (TimeoutException e) {
>       throw new InterruptedIOException(e.getMessage());
>     } finally {
>       // We get there because we were interrupted or because one or more of 
> the
>       // calls succeeded or failed. In all case, we stop all our tasks.
>       cs.cancelAll();
>     }
>     return null; // unreachable
>  {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to