[ 
https://issues.apache.org/jira/browse/HDDS-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen resolved HDDS-10497.
-------------------------------
    Fix Version/s: 1.5.0
       Resolution: Fixed

> [hsync] Refresh block token immediately if block token expires
> --------------------------------------------------------------
>
>                 Key: HDDS-10497
>                 URL: https://issues.apache.org/jira/browse/HDDS-10497
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.5.0
>
>
> HDDS-9734 and HDDS-7930 improves error handling when input stream fails to 
> read due to expired block token. But it only refreshes block token after 
> retry every datanode in the pipeline, which not only adds log spew but also 
> increase 99.9% tail latency.
> The input stream should request new block token immediately after an expired 
> block token.
> Relevant logs:
> {noformat}
> 2024-03-08 23:03:20,109 WARN 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls: Failed to read 
> chunk 113750153625603061_chunk_1 (len=1048576) conID: 4 locID: 
> 113750153625603061 bcsId: 129941 from 
> 5fa1d092-1f11-4f6e-af4a-cf2785a8cae4(ccycloud-1.weichiu-hbase.root.comops.site/10.140.131.133);
>  will try another datanode.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  BLOCK_TOKEN_VERIFICATION_FAILED for null: Expired token for user: hbase 
> (auth:SIMPLE)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:675)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$createValidators$4(ContainerProtocolCalls.java:686)
>         at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:400)
>         at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.lambda$sendCommandWithTraceIDAndRetry$0(XceiverClientGrpc.java:340)
>         at 
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:159)
>         at 
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
>         at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:335)
>         at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:316)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:358)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$readChunk$2(ContainerProtocolCalls.java:345)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.tryEachDatanode(ContainerProtocolCalls.java:147)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:344)
>         at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:425)
>         at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkDataIntoBuffers(ChunkInputStream.java:402)
>         at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:387)
>         at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:319)
>         at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:173)
>         at 
> org.apache.hadoop.hdds.scm.storage.ByteArrayReader.readFromBlock(ByteArrayReader.java:54)
>         at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.readWithStrategy(BlockInputStream.java:367)
> ...
> 2024-03-08 23:03:20,112 ERROR org.apache.hadoop.hdds.scm.XceiverClientGrpc: 
> Failed to execute command ReadChunk on the pipeline Pipeline[ Id: 
> 04646212-c013-4f8c-9ada-80580c189135, Nodes: 
> 5fa1d092-1f11-4f6e-af4a-cf2785a8cae4(ccycloud-1.weichiu-hbase.root.comops.site/10.140.131.133)98e5528d-c790-465e-91e0-f47d4cabe3bc(ccycloud-3.weichiu-hbase.root.comops.site/10.140.103.18)0238996a-9361-4b83-aaa8-e99fd9523ad0(ccycloud-2.weichiu-hbase.root.comops.site/10.140.135.20),
>  ReplicationConfig: STANDALONE/THREE, State:OPEN, 
> leaderId:98e5528d-c790-465e-91e0-f47d4cabe3bc, 
> CreationTimestamp2024-03-08T18:59:15.755Z[UTC]].
> 2024-03-08 23:03:20,113 WARN 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls: Failed to read 
> chunk 113750153625603061_chunk_1 (len=1048576) conID: 4 locID: 
> 113750153625603061 bcsId: 129941 from 
> 98e5528d-c790-465e-91e0-f47d4cabe3bc(ccycloud-3.weichiu-hbase.root.comops.site/10.140.103.18);
>  will try another datanode.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  BLOCK_TOKEN_VERIFICATION_FAILED for null: Expired token for user: hbase 
> (auth:SIMPLE)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:675)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$createValidators$4(ContainerProtocolCalls.java:686)
>         at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:400)
>         at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.lambda$sendCommandWithTraceIDAndRetry$0(XceiverClientGrpc.java:340)
>         at 
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:159)
>         at 
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
>         at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:335)
>         at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:316)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:358)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$readChunk$2(ContainerProtocolCalls.java:345)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.tryEachDatanode(ContainerProtocolCalls.java:147)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:344)
> ...
> 2024-03-08 23:03:20,116 ERROR org.apache.hadoop.hdds.scm.XceiverClientGrpc: 
> Failed to execute command ReadChunk on the pipeline Pipeline[ Id: 
> 04646212-c013-4f8c-9ada-80580c189135, Nodes: 
> 5fa1d092-1f11-4f6e-af4a-cf2785a8cae4(ccycloud-1.weichiu-hbase.root.comops.site/10.140.131.133)98e5528d-c790-465e-91e0-f47d4cabe3bc(ccycloud-3.weichiu-hbase.root.comops.site/10.140.103.18)0238996a-9361-4b83-aaa8-e99fd9523ad0(ccycloud-2.weichiu-hbase.root.comops.site/10.140.135.20),
>  ReplicationConfig: STANDALONE/THREE, State:OPEN, 
> leaderId:98e5528d-c790-465e-91e0-f47d4cabe3bc, 
> CreationTimestamp2024-03-08T18:59:15.755Z[UTC]].
> 2024-03-08 23:03:20,390 INFO 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream: Unable to read 
> information for block conID: 3 locID: 113750153625603098 bcsId: 459126 from 
> pipeline PipelineID=eb1d2690-75a6-48d7-9eec-60675b907fc0: 
> BLOCK_TOKEN_VERIFICATION_FAILED for null: Expired token for user: hbase 
> (auth:SIMPLE)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to