[
https://issues.apache.org/jira/browse/HDDS-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wei-Chiu Chuang updated HDDS-11521:
-----------------------------------
Priority: Critical (was: Major)
> Race condition between pipeline close and block allocation causes client
> aborts
> -------------------------------------------------------------------------------
>
> Key: HDDS-11521
> URL: https://issues.apache.org/jira/browse/HDDS-11521
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Wei-Chiu Chuang
> Priority: Critical
>
> We have a HMaster aborted prematurely. Looking at the relevant logs (HMaster,
> SCM), it appears there is a race condition where if the client waiting to
> allocate a new block while the pipeline of the block is closed, the client
> would wait for up to 60 seconds, and then abort without retry.
> Expected behavior: the client should (either be preempted when the pipeline
> is closed or wait for 60 second timeout) retry with another pipeline.
> Relevant log:
> Pipeline creation:
> {noformat}
> 2024-10-01 09:51:07,285 INFO [IPC Server handler 95 on
> 9863]-org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Sending
> CreatePipelineCommand for pipeline:PipelineID=48431096-9933-46
> d6-a462-abfc89ecd8b0 to datanode:b097b750-84ac-4aac-98b2-0917935b7cda
> 2024-10-01 09:51:07,285 INFO [IPC Server handler 95 on
> 9863]-org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Sending
> CreatePipelineCommand for pipeline:PipelineID=48431096-9933-46
> d6-a462-abfc89ecd8b0 to datanode:0abd3422-fb3b-48dc-9dfa-27978cc3e1d6
> 2024-10-01 09:51:07,285 INFO [IPC Server handler 95 on
> 9863]-org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Sending
> CreatePipelineCommand for pipeline:PipelineID=48431096-9933-46
> d6-a462-abfc89ecd8b0 to datanode:06c64b46-3f66-45e6-8c65-1ce0bd979379
> {noformat}
> Pipeline close:
> {noformat}
> 2024-10-01 09:51:24,132 INFO
> [node1-EventQueue-StaleNodeForStaleNodeHandler]-org.apache.hadoop.hdds.scm.node.StaleNodeHandler:
> Datanode
> 06c64b46-3f66-45e6-8c65-1ce0bd979379(ccycloud-5.quasar-aljjma.root.comops.site/10.140.13.6)
> moved to stale state. Finalizing its pipelines
> [PipelineID=48431096-9933-46d6-a462-abfc89ecd8b0,
> PipelineID=ffe5aa54-f12b-4334-aae4-5921f54bb916,
> PipelineID=053b7d1d-e351-454b-94f1-f2cf81c403df,
> PipelineID=4b647943-30e5-49d7-8f4a-cd374b7e8e1b,
> PipelineID=1b5f1653-671b-4959-a684-2c8eb7a6b96f]
> 2024-10-01 09:51:24,140 INFO
> [node1-EventQueue-StaleNodeForStaleNodeHandler]-org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl:
> Pipeline PipelineID=48431096-9933-46d6-a462-abfc89ecd8b0 moved to CLOSED
> state
> {noformat}
> Pipeline allocation timeout
> {noformat}
> 2024-10-01 09:52:07,368 WARN [IPC Server handler 95 on
> 9863]-org.apache.hadoop.hdds.scm.pipeline.WritableRatisContainerProvider:
> Pipeline creation failed for repConfig: RATIS/THREE. Retrying get pipelines
> call once.
> java.io.IOException: Pipeline 48431096-9933-46d6-a462-abfc89ecd8b0 is not
> ready in 60000 ms
> at
> org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl.waitOnePipelineReady(PipelineManagerImpl.java:772)
> at
> org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl.waitPipelineReady(PipelineManagerImpl.java:725)
> at
> org.apache.hadoop.hdds.scm.pipeline.WritableRatisContainerProvider.getContainer(WritableRatisContainerProvider.java:103)
> at
> org.apache.hadoop.hdds.scm.pipeline.WritableContainerFactory.getContainer(WritableContainerFactory.java:74)
> at
> org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:163)
> at
> org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:216)
> at
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:198)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]