[
https://issues.apache.org/jira/browse/HBASE-21941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guanghao Zhang updated HBASE-21941:
-----------------------------------
Description:
There are hbase.rpc.timeout and hbase.client.operation.timeout for client
operation expect scan. And there is a special config
hbase.client.scanner.timeout.period for scan. If I am not wrong, this should
rpc timeout of scan call. But now we use this as operation timeout of scan
call. The scan callable is complicated as we need handle the replica case. The
real call with retry is called in
[https://github.com/apache/hbase/blob/9a55cbb2c1dfe5a13a6ceb323ac7edd23532f4b5/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ResultBoundedCompletionService.java#L80|https://github.com/apache/hbase/blob/9a55cbb2c1dfe5a13a6ceb323ac7edd23532f4b5/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ResultBoundedCompletionService.java#L80.]
. And the callTimeout is configed by hbase.client.scanner.timeout.period. So I
thought this is not right.
I meet this problem when run ITBLL for branch-2.2. The verify map task failed
when scan.
{noformat}
2019-02-21 03:47:20,287 INFO [main]
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=16, exceptions:
2019-02-21 03:47:20,287 INFO [main]
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: Closing the previously
opened scanner object. 2019-02-21 03:47:20,331 INFO [main]
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: Current
scan={"loadColumnFamiliesOnDemand":null,"startRow":"\\xE1\\x9B\\xB4\\xF0\\xB3(JT\\xDC\\x86pf|y\\xF3\\xE9","stopRow":"","batch":-1,"cacheBlocks":false,"totalColumns":3,"maxResultSize":4194304,"families":{"big":["big"],"meta":["prev"],"tiny":["tiny"]},"caching":10000,"maxVersions":1,"timeRange":[0,9223372036854775807]}
2019-02-21 03:47:20,335 INFO
[hconnection-0x7b44b63d-metaLookup-shared--pool4-t36]
org.apache.hadoop.hbase.client.ScannerCallable: Open
scanner=-4916858472898750097 for
scan={"loadColumnFamiliesOnDemand":null,"startRow":"IntegrationTestBigLinkedList,\\xE1\\x9B\\xB4\\xF0\\xB3(JT\\xDC\\x86pf|y\\xF3\\xE9,99999999999999","stopRow":"IntegrationTestBigLinkedList,,","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":-1,"families":{"info":["ALL"]},"caching":5,"maxVersions":1,"timeRange":[0,9223372036854775807]}
on region region=hbase:meta,,1.1588230740,
hostname=c4-hadoop-tst-st26.bj,29100,1550660298519, seqNum=-1 2019-02-21
03:48:20,354 INFO [main]
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: Mapper took 60023ms to
process 0 rows 2019-02-21 03:48:20,355 INFO [main]
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=16, exceptions: Thu Feb 21 03:48:20 CST 2019, null,
java.net.SocketTimeoutException: callTimeout=60000, callDuration=60215: Call to
c4-hadoop-tst-st30.bj/10.132.2.41:29100 failed on local exception:
org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=7102, waitTime=60006,
rpcTimeout=60000 row 'ᛴ�(JT܆pf|y��' on table 'IntegrationTestBigLinkedList' at
region=IntegrationTestBigLinkedList,\xDD\xDD\xDD\xDD\xDD\xDD\xDD\xDD,1550661322522.d5d29d2f1e8fee42d666c117709c3a46.,
hostname=c4-hadoop-tst-st30.bj,29100,1550652984371, seqNum=1007960
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=16, exceptions: Thu Feb 21 03:48:20 CST 2019, null,
java.net.SocketTimeoutException: callTimeout=60000, callDuration=60215: Call to
c4-hadoop-tst-st30.bj/10.132.2.41:29100 failed on local exception:
org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=7102, waitTime=60006,
rpcTimeout=60000 row 'ᛴ�(JT܆pf|y��' on table 'IntegrationTestBigLinkedList' at
region=IntegrationTestBigLinkedList,\xDD\xDD\xDD\xDD\xDD\xDD\xDD\xDD,1550661322522.d5d29d2f1e8fee42d666c117709c3a46.,
hostname=c4-hadoop-tst-st30.bj,29100,1550652984371, seqNum=1007960 at
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:299)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:242)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
at
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:266)
at
org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:434)
at
org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:309)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:594)
at
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:237){noformat}
> Use operation timeout instead of scanner timeout for scan call
> --------------------------------------------------------------
>
> Key: HBASE-21941
> URL: https://issues.apache.org/jira/browse/HBASE-21941
> Project: HBase
> Issue Type: Bug
> Reporter: Guanghao Zhang
> Priority: Major
>
> There are hbase.rpc.timeout and hbase.client.operation.timeout for client
> operation expect scan. And there is a special config
> hbase.client.scanner.timeout.period for scan. If I am not wrong, this should
> rpc timeout of scan call. But now we use this as operation timeout of scan
> call. The scan callable is complicated as we need handle the replica case.
> The real call with retry is called in
> [https://github.com/apache/hbase/blob/9a55cbb2c1dfe5a13a6ceb323ac7edd23532f4b5/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ResultBoundedCompletionService.java#L80|https://github.com/apache/hbase/blob/9a55cbb2c1dfe5a13a6ceb323ac7edd23532f4b5/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ResultBoundedCompletionService.java#L80.]
> . And the callTimeout is configed by hbase.client.scanner.timeout.period. So
> I thought this is not right.
>
> I meet this problem when run ITBLL for branch-2.2. The verify map task failed
> when scan.
> {noformat}
> 2019-02-21 03:47:20,287 INFO [main]
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=16, exceptions:
> 2019-02-21 03:47:20,287 INFO [main]
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: Closing the
> previously opened scanner object. 2019-02-21 03:47:20,331 INFO [main]
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: Current
> scan={"loadColumnFamiliesOnDemand":null,"startRow":"\\xE1\\x9B\\xB4\\xF0\\xB3(JT\\xDC\\x86pf|y\\xF3\\xE9","stopRow":"","batch":-1,"cacheBlocks":false,"totalColumns":3,"maxResultSize":4194304,"families":{"big":["big"],"meta":["prev"],"tiny":["tiny"]},"caching":10000,"maxVersions":1,"timeRange":[0,9223372036854775807]}
> 2019-02-21 03:47:20,335 INFO
> [hconnection-0x7b44b63d-metaLookup-shared--pool4-t36]
> org.apache.hadoop.hbase.client.ScannerCallable: Open
> scanner=-4916858472898750097 for
> scan={"loadColumnFamiliesOnDemand":null,"startRow":"IntegrationTestBigLinkedList,\\xE1\\x9B\\xB4\\xF0\\xB3(JT\\xDC\\x86pf|y\\xF3\\xE9,99999999999999","stopRow":"IntegrationTestBigLinkedList,,","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":-1,"families":{"info":["ALL"]},"caching":5,"maxVersions":1,"timeRange":[0,9223372036854775807]}
> on region region=hbase:meta,,1.1588230740,
> hostname=c4-hadoop-tst-st26.bj,29100,1550660298519, seqNum=-1 2019-02-21
> 03:48:20,354 INFO [main]
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: Mapper took 60023ms
> to process 0 rows 2019-02-21 03:48:20,355 INFO [main]
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=16, exceptions: Thu Feb 21 03:48:20 CST 2019, null,
> java.net.SocketTimeoutException: callTimeout=60000, callDuration=60215: Call
> to c4-hadoop-tst-st30.bj/10.132.2.41:29100 failed on local exception:
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=7102,
> waitTime=60006, rpcTimeout=60000 row 'ᛴ�(JT܆pf|y��' on table
> 'IntegrationTestBigLinkedList' at
> region=IntegrationTestBigLinkedList,\xDD\xDD\xDD\xDD\xDD\xDD\xDD\xDD,1550661322522.d5d29d2f1e8fee42d666c117709c3a46.,
> hostname=c4-hadoop-tst-st30.bj,29100,1550652984371, seqNum=1007960
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=16, exceptions: Thu Feb 21 03:48:20 CST 2019, null,
> java.net.SocketTimeoutException: callTimeout=60000, callDuration=60215: Call
> to c4-hadoop-tst-st30.bj/10.132.2.41:29100 failed on local exception:
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=7102,
> waitTime=60006, rpcTimeout=60000 row 'ᛴ�(JT܆pf|y��' on table
> 'IntegrationTestBigLinkedList' at
> region=IntegrationTestBigLinkedList,\xDD\xDD\xDD\xDD\xDD\xDD\xDD\xDD,1550661322522.d5d29d2f1e8fee42d666c117709c3a46.,
> hostname=c4-hadoop-tst-st30.bj,29100,1550652984371, seqNum=1007960 at
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:299)
> at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:242)
> at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192)
> at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:266)
> at
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:434)
> at
> org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:309)
> at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:594)
> at
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:237){noformat}
>
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)