[
https://issues.apache.org/jira/browse/HBASE-21941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775236#comment-16775236
]
Peter Somogyi commented on HBASE-21941:
---------------------------------------
Sorry, I'm not really following what the actual problem is.
[~zghaobac] Do you recommend to increase the default
hbase.client.scanner.timeout.period?
> Increment the default scanner timeout
> -------------------------------------
>
> Key: HBASE-21941
> URL: https://issues.apache.org/jira/browse/HBASE-21941
> Project: HBase
> Issue Type: Bug
> Reporter: Guanghao Zhang
> Priority: Major
>
> There are hbase.rpc.timeout and hbase.client.operation.timeout for client
> operation expect scan. And there is a special config
> hbase.client.scanner.timeout.period for scan. If I am not wrong, this should
> rpc timeout of scan call. But now we use this as operation timeout of scan
> call. The scan callable is complicated as we need handle the replica case.
> The real call with retry is called in
> [https://github.com/apache/hbase/blob/9a55cbb2c1dfe5a13a6ceb323ac7edd23532f4b5/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ResultBoundedCompletionService.java#L80|https://github.com/apache/hbase/blob/9a55cbb2c1dfe5a13a6ceb323ac7edd23532f4b5/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ResultBoundedCompletionService.java#L80.]
> . And the callTimeout is configed by hbase.client.scanner.timeout.period. So
> I thought this is not right.
>
> I meet this problem when run ITBLL for branch-2.2. The verify map task failed
> when scan.
> {code:java}
> 2019-02-21 03:47:20,287 INFO [main]
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=16, exceptions:
> 2019-02-21 03:47:20,287 INFO [main]
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: Closing the
> previously opened scanner object.
> 2019-02-21 03:47:20,331 INFO [main]
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: Current
> scan={"loadColumnFamiliesOnDemand":null,"startRow":"\\xE1\\x9B\\xB4\\xF0\\xB3(JT\\xDC\\x86pf|y\\xF3\\xE9","stopRow":"","batch":-1,"cacheBlocks":false,"totalColumns":3,"maxResultSize":4194304,"families":{"big":["big"],"meta":["prev"],"tiny":["tiny"]},"caching":10000,"maxVersions":1,"timeRange":[0,9223372036854775807]}
> 2019-02-21 03:47:20,335 INFO
> [hconnection-0x7b44b63d-metaLookup-shared--pool4-t36]
> org.apache.hadoop.hbase.client.ScannerCallable: Open
> scanner=-4916858472898750097 for
> scan={"loadColumnFamiliesOnDemand":null,"startRow":"IntegrationTestBigLinkedList,\\xE1\\x9B\\xB4\\xF0\\xB3(JT\\xDC\\x86pf|y\\xF3\\xE9,99999999999999","stopRow":"IntegrationTestBigLinkedList,,","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":-1,"families":{"info":["ALL"]},"caching":5,"maxVersions":1,"timeRange":[0,9223372036854775807]}
> on region region=hbase:meta,,1.1588230740,
> hostname=c4-hadoop-tst-st26.bj,29100,1550660298519, seqNum=-1
> 2019-02-21 03:48:20,354 INFO [main]
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: Mapper took 60023ms
> to process 0 rows
> 2019-02-21 03:48:20,355 INFO [main]
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=16, exceptions: Thu Feb 21 03:48:20 CST 2019, null,
> java.net.SocketTimeoutException: callTimeout=60000, callDuration=60215: Call
> to c4-hadoop-tst-st30.bj/10.132.2.41:29100 failed on local exception:
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=7102,
> waitTime=60006, rpcTimeout=60000 row 'ᛴ�(JT܆pf|y��' on table
> 'IntegrationTestBigLinkedList' at
> region=IntegrationTestBigLinkedList,\xDD\xDD\xDD\xDD\xDD\xDD\xDD\xDD,1550661322522.d5d29d2f1e8fee42d666c117709c3a46.,
> hostname=c4-hadoop-tst-st30.bj,29100,1550652984371, seqNum=1007960
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=16, exceptions: Thu Feb 21 03:48:20 CST 2019, null,
> java.net.SocketTimeoutException: callTimeout=60000, callDuration=60215: Call
> to c4-hadoop-tst-st30.bj/10.132.2.41:29100 failed on local exception:
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=7102,
> waitTime=60006, rpcTimeout=60000 row 'ᛴ�(JT܆pf|y��' on table
> 'IntegrationTestBigLinkedList' at
> region=IntegrationTestBigLinkedList,\xDD\xDD\xDD\xDD\xDD\xDD\xDD\xDD,1550661322522.d5d29d2f1e8fee42d666c117709c3a46.,
> hostname=c4-hadoop-tst-st30.bj,29100,1550652984371, seqNum=1007960 at
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:299)
> at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:242)
> at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192)
> at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:266)
> at
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:434)
> at
> org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:309)
> at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:594)
> at
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:237)
> {code}
>
> When TableRecordReaderImpl#nextKeyValue first scan faile, it recoverd by
> close the old scaner and open a new one. But the new scanner failed after 60
> seconds. Then it throw the exception and don't try to recovery and make the
> task failed.
>
> Or if the scanner timeout is a operation timeout of scan call, we should set
> the default scanner timeout to 1200000 which is same with the default
> operation timeout.
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)