Wei-Chiu Chuang created HDDS-10497:
--------------------------------------
Summary: [hsync] Refresh block token immediately if block token
expires
Key: HDDS-10497
URL: https://issues.apache.org/jira/browse/HDDS-10497
Project: Apache Ozone
Issue Type: Sub-task
Reporter: Wei-Chiu Chuang
HDDS-9734 and HDDS-7930 improves error handling when input stream fails to read
due to expired block token. But it only refreshes block token after retry every
datanode in the pipeline, which not only adds log spew but also increase 99.9%
tail latency.
The input stream should request new block token immediately after an expired
block token.
Relevant logs:
{noformat}
2024-03-08 23:03:20,109 WARN
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls: Failed to read chunk
113750153625603061_chunk_1 (len=1048576) conID: 4 locID: 113750153625603061
bcsId: 129941 from
5fa1d092-1f11-4f6e-af4a-cf2785a8cae4(ccycloud-1.weichiu-hbase.root.comops.site/10.140.131.133);
will try another datanode.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
BLOCK_TOKEN_VERIFICATION_FAILED for null: Expired token for user: hbase
(auth:SIMPLE)
at
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:675)
at
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$createValidators$4(ContainerProtocolCalls.java:686)
at
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:400)
at
org.apache.hadoop.hdds.scm.XceiverClientGrpc.lambda$sendCommandWithTraceIDAndRetry$0(XceiverClientGrpc.java:340)
at
org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:159)
at
org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
at
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:335)
at
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:316)
at
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:358)
at
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$readChunk$2(ContainerProtocolCalls.java:345)
at
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.tryEachDatanode(ContainerProtocolCalls.java:147)
at
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:344)
at
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:425)
at
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkDataIntoBuffers(ChunkInputStream.java:402)
at
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:387)
at
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:319)
at
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:173)
at
org.apache.hadoop.hdds.scm.storage.ByteArrayReader.readFromBlock(ByteArrayReader.java:54)
at
org.apache.hadoop.hdds.scm.storage.BlockInputStream.readWithStrategy(BlockInputStream.java:367)
...
2024-03-08 23:03:20,112 ERROR org.apache.hadoop.hdds.scm.XceiverClientGrpc:
Failed to execute command ReadChunk on the pipeline Pipeline[ Id:
04646212-c013-4f8c-9ada-80580c189135, Nodes:
5fa1d092-1f11-4f6e-af4a-cf2785a8cae4(ccycloud-1.weichiu-hbase.root.comops.site/10.140.131.133)98e5528d-c790-465e-91e0-f47d4cabe3bc(ccycloud-3.weichiu-hbase.root.comops.site/10.140.103.18)0238996a-9361-4b83-aaa8-e99fd9523ad0(ccycloud-2.weichiu-hbase.root.comops.site/10.140.135.20),
ReplicationConfig: STANDALONE/THREE, State:OPEN,
leaderId:98e5528d-c790-465e-91e0-f47d4cabe3bc,
CreationTimestamp2024-03-08T18:59:15.755Z[UTC]].
2024-03-08 23:03:20,113 WARN
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls: Failed to read chunk
113750153625603061_chunk_1 (len=1048576) conID: 4 locID: 113750153625603061
bcsId: 129941 from
98e5528d-c790-465e-91e0-f47d4cabe3bc(ccycloud-3.weichiu-hbase.root.comops.site/10.140.103.18);
will try another datanode.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
BLOCK_TOKEN_VERIFICATION_FAILED for null: Expired token for user: hbase
(auth:SIMPLE)
at
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:675)
at
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$createValidators$4(ContainerProtocolCalls.java:686)
at
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:400)
at
org.apache.hadoop.hdds.scm.XceiverClientGrpc.lambda$sendCommandWithTraceIDAndRetry$0(XceiverClientGrpc.java:340)
at
org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:159)
at
org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
at
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:335)
at
org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:316)
at
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:358)
at
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$readChunk$2(ContainerProtocolCalls.java:345)
at
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.tryEachDatanode(ContainerProtocolCalls.java:147)
at
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:344)
...
2024-03-08 23:03:20,116 ERROR org.apache.hadoop.hdds.scm.XceiverClientGrpc:
Failed to execute command ReadChunk on the pipeline Pipeline[ Id:
04646212-c013-4f8c-9ada-80580c189135, Nodes:
5fa1d092-1f11-4f6e-af4a-cf2785a8cae4(ccycloud-1.weichiu-hbase.root.comops.site/10.140.131.133)98e5528d-c790-465e-91e0-f47d4cabe3bc(ccycloud-3.weichiu-hbase.root.comops.site/10.140.103.18)0238996a-9361-4b83-aaa8-e99fd9523ad0(ccycloud-2.weichiu-hbase.root.comops.site/10.140.135.20),
ReplicationConfig: STANDALONE/THREE, State:OPEN,
leaderId:98e5528d-c790-465e-91e0-f47d4cabe3bc,
CreationTimestamp2024-03-08T18:59:15.755Z[UTC]].
2024-03-08 23:03:20,390 INFO
org.apache.hadoop.hdds.scm.storage.BlockInputStream: Unable to read information
for block conID: 3 locID: 113750153625603098 bcsId: 459126 from pipeline
PipelineID=eb1d2690-75a6-48d7-9eec-60675b907fc0:
BLOCK_TOKEN_VERIFICATION_FAILED for null: Expired token for user: hbase
(auth:SIMPLE)
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]