[
https://issues.apache.org/jira/browse/HBASE-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860417#action_12860417
]
Jean-Daniel Cryans commented on HBASE-2481:
-------------------------------------------
This was caused by HBASE-1671, this changed in ScannerCallable:
{code}
public Result [] call() throws IOException {
if (scannerId != -1L && closed) {
- server.close(scannerId);
- scannerId = -1L;
+ close();
} else if (scannerId == -1L && !closed) {
- // open the scanner
- scannerId = openScanner();
+ this.scannerId = openScanner();
} else {
- Result [] rrs = server.next(scannerId, caching);
+ Result [] rrs = null;
+ try {
+ rrs = server.next(scannerId, caching);
+ } catch (IOException e) {
+ IOException ioe = null;
+ if (e instanceof RemoteException) {
+ ioe = RemoteExceptionHandler.decodeRemoteException((RemoteException)e);
+ }
+ if (ioe != null && ioe instanceof NotServingRegionException) {
+ // Throw a DNRE so that we break out of cycle of calling NSRE
+ // when what we need is to open scanner against new location.
+ // Attach NSRE to signal client that it needs to resetup scanner.
+ throw new DoNotRetryIOException("Reset scanner", ioe);
+ }
+ }
return rrs == null || rrs.length == 0? null: rrs;
}
{code}
We now eat the exception if it's not NSRE, throwing it if the exception is a
DoNotRetryIOException is the right thing to do, but the client code is still
broken. In HTable.ClientScanner.next:
{code}
try {
// Server returns a null values if scanning is to stop. Else,
// returns an empty array if scanning is to go on and we've just
// exhausted current region.
values = getConnection().getRegionServerWithRetries(callable);
if (skipFirst) {
skipFirst = false;
// Reget.
values = getConnection().getRegionServerWithRetries(callable);
}
} catch (DoNotRetryIOException e) {
Throwable cause = e.getCause();
if (cause == null || !(cause instanceof NotServingRegionException))
{
throw e;
}
// Else, its signal from depths of ScannerCallable that we got an
// NSRE on a next and that we need to reset the scanner.
if (this.lastResult != null) {
this.scan.setStartRow(this.lastResult.getRow());
// Skip first row returned. We already let it out on previous
// invocation.
skipFirst = true;
}
// Clear region
this.currentRegion = null;
continue;
} catch (IOException e) {
if (e instanceof UnknownScannerException &&
lastNext + scannerTimeout < System.currentTimeMillis()) {
ScannerTimeoutException ex = new ScannerTimeoutException();
ex.initCause(e);
throw ex;
}
throw e;
}
{code}
We catch the DoNotRetryIOException first and in the other catch clause we check
for UnknownScannerException, which extends DoNotRetryIOException... so
ScannerTimeoutException is never used! Easy fix.
> Client is not getting UnknownScannerExceptions; they are being eaten
> --------------------------------------------------------------------
>
> Key: HBASE-2481
> URL: https://issues.apache.org/jira/browse/HBASE-2481
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.4
> Reporter: stack
> Priority: Blocker
>
> This was reported by mudphone on IRC and confirmed by myself in quick test.
> If the client takes too long going back to the RS, the RS will throw an
> UnknownScannerException but it doesn't get back to the client. Instead, the
> client scan silently ends. Marking this blocker. Its actually in 0.20.4.
> Thats what I was testing. Mayhaps an RC sinker?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.