[ 
https://issues.apache.org/jira/browse/HBASE-21941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773993#comment-16773993
 ] 

Guanghao Zhang commented on HBASE-21941:
----------------------------------------

Checked the code again. When scanner.call use 
RpcRetryingCallerImpl#callWithRetries, it use hbase.rpc.timeout as rpc timeout 
and use hbase.client.scanner.timeout.period as operation timeout.

> Increment the default scanner timeout
> -------------------------------------
>
>                 Key: HBASE-21941
>                 URL: https://issues.apache.org/jira/browse/HBASE-21941
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Guanghao Zhang
>            Priority: Major
>
> There are hbase.rpc.timeout and hbase.client.operation.timeout for client 
> operation expect scan. And there is a special config 
> hbase.client.scanner.timeout.period for scan. If I am not wrong, this should 
> rpc timeout of scan call. But now we use this as operation timeout of scan 
> call. The scan callable is complicated as we need handle the replica case. 
> The real call with retry is called in 
> [https://github.com/apache/hbase/blob/9a55cbb2c1dfe5a13a6ceb323ac7edd23532f4b5/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ResultBoundedCompletionService.java#L80|https://github.com/apache/hbase/blob/9a55cbb2c1dfe5a13a6ceb323ac7edd23532f4b5/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ResultBoundedCompletionService.java#L80.]
>  . And the callTimeout is configed by hbase.client.scanner.timeout.period. So 
> I thought this is not right.
>  
> I meet this problem when run ITBLL for branch-2.2. The verify map task failed 
> when scan.
> {code:java}
> 2019-02-21 03:47:20,287 INFO [main] 
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from 
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
> attempts=16, exceptions: 
> 2019-02-21 03:47:20,287 INFO [main] 
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: Closing the 
> previously opened scanner object.
> 2019-02-21 03:47:20,331 INFO [main] 
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: Current 
> scan={"loadColumnFamiliesOnDemand":null,"startRow":"\\xE1\\x9B\\xB4\\xF0\\xB3(JT\\xDC\\x86pf|y\\xF3\\xE9","stopRow":"","batch":-1,"cacheBlocks":false,"totalColumns":3,"maxResultSize":4194304,"families":{"big":["big"],"meta":["prev"],"tiny":["tiny"]},"caching":10000,"maxVersions":1,"timeRange":[0,9223372036854775807]}
>  2019-02-21 03:47:20,335 INFO 
> [hconnection-0x7b44b63d-metaLookup-shared--pool4-t36] 
> org.apache.hadoop.hbase.client.ScannerCallable: Open 
> scanner=-4916858472898750097 for 
> scan={"loadColumnFamiliesOnDemand":null,"startRow":"IntegrationTestBigLinkedList,\\xE1\\x9B\\xB4\\xF0\\xB3(JT\\xDC\\x86pf|y\\xF3\\xE9,99999999999999","stopRow":"IntegrationTestBigLinkedList,,","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":-1,"families":{"info":["ALL"]},"caching":5,"maxVersions":1,"timeRange":[0,9223372036854775807]}
>  on region region=hbase:meta,,1.1588230740, 
> hostname=c4-hadoop-tst-st26.bj,29100,1550660298519, seqNum=-1
> 2019-02-21 03:48:20,354 INFO [main] 
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: Mapper took 60023ms 
> to process 0 rows
> 2019-02-21 03:48:20,355 INFO [main] 
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: 
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
> attempts=16, exceptions: Thu Feb 21 03:48:20 CST 2019, null, 
> java.net.SocketTimeoutException: callTimeout=60000, callDuration=60215: Call 
> to c4-hadoop-tst-st30.bj/10.132.2.41:29100 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=7102, 
> waitTime=60006, rpcTimeout=60000 row 'ᛴ�(JT܆pf|y��' on table 
> 'IntegrationTestBigLinkedList' at 
> region=IntegrationTestBigLinkedList,\xDD\xDD\xDD\xDD\xDD\xDD\xDD\xDD,1550661322522.d5d29d2f1e8fee42d666c117709c3a46.,
>  hostname=c4-hadoop-tst-st30.bj,29100,1550652984371, seqNum=1007960 
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
> attempts=16, exceptions: Thu Feb 21 03:48:20 CST 2019, null, 
> java.net.SocketTimeoutException: callTimeout=60000, callDuration=60215: Call 
> to c4-hadoop-tst-st30.bj/10.132.2.41:29100 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=7102, 
> waitTime=60006, rpcTimeout=60000 row 'ᛴ�(JT܆pf|y��' on table 
> 'IntegrationTestBigLinkedList' at 
> region=IntegrationTestBigLinkedList,\xDD\xDD\xDD\xDD\xDD\xDD\xDD\xDD,1550661322522.d5d29d2f1e8fee42d666c117709c3a46.,
>  hostname=c4-hadoop-tst-st30.bj,29100,1550652984371, seqNum=1007960 at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:299)
>  at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:242)
>  at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
>  at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192)
>  at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:266) 
> at 
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:434)
>  at 
> org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:309)
>  at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:594) 
> at 
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:237)
> {code}
>  
> When TableRecordReaderImpl#nextKeyValue first scan faile, it recoverd by 
> close the old scaner and open a new one. But the new scanner failed after 60 
> seconds. Then it throw the exception and don't try to recovery and make the 
> task failed.
>  
> Or if the scanner timeout is a operation timeout of scan call, we should set 
> the default scanner timeout to 1200000 which is same with the default 
> operation timeout.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to