[ https://issues.apache.org/jira/browse/HBASE-16132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359972#comment-15359972 ]
Yu Li commented on HBASE-16132: ------------------------------- Thanks for further clarification, got your point now. So in both ScannerCallableWithReplicas and RpcRetryingCallerWithReadReplicas it calls {{Future.get}}, and the main difference is that RpcRetryingCallerWithReadReplicas calls {{cs.take}} instead of {{cs.poll}} for the second replica, which means we will dead-wait on the second one if the first replica timed out. Since RpcRetryingCallerWithReadReplicas is used by get and get is a special type of scan, I agree that it's better to follow the same way in ScannerCallableWithReplicas. Let me push this patch in first (to solve the problem) and open another JIRA about your proposal sir (a new JIRA will be more visible so we could better see others' thoughts :-)). Thanks for point this out [~devaraj] > Scan does not return all the result when regionserver is busy > ------------------------------------------------------------- > > Key: HBASE-16132 > URL: https://issues.apache.org/jira/browse/HBASE-16132 > Project: HBase > Issue Type: Bug > Reporter: binlijin > Assignee: binlijin > Attachments: HBASE-16132.patch, HBASE-16132_v2.patch, > HBASE-16132_v3.patch, HBASE-16132_v3.patch, TestScanMissingData.java > > > We have find some corner case, when regionserver is busy and last a long > time. Some scanner may return null even if they do not scan all data. > We find in ScannerCallableWithReplicas there is a case do not handler > correct, when cs.poll timeout and do not return any result , it is will > return a null result, so scan get null result, and end the scan. > {code} > try { > Future<Pair<Result[], ScannerCallable>> f = cs.poll(timeout, > TimeUnit.MILLISECONDS); > if (f != null) { > Pair<Result[], ScannerCallable> r = f.get(timeout, > TimeUnit.MILLISECONDS); > if (r != null && r.getSecond() != null) { > updateCurrentlyServingReplica(r.getSecond(), r.getFirst(), done, > pool); > } > return r == null ? null : r.getFirst(); // great we got an answer > } > } catch (ExecutionException e) { > RpcRetryingCallerWithReadReplicas.throwEnrichedException(e, retries); > } catch (CancellationException e) { > throw new InterruptedIOException(e.getMessage()); > } catch (InterruptedException e) { > throw new InterruptedIOException(e.getMessage()); > } catch (TimeoutException e) { > throw new InterruptedIOException(e.getMessage()); > } finally { > // We get there because we were interrupted or because one or more of > the > // calls succeeded or failed. In all case, we stop all our tasks. > cs.cancelAll(); > } > return null; // unreachable > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)