ChenSammi commented on a change in pull request #1469: HDDS-2034. Async RATIS pipeline creation and destroy through heartbea… URL: https://github.com/apache/hadoop/pull/1469#discussion_r329335596
########## File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/BlockManagerImpl.java ########## @@ -188,6 +208,15 @@ public AllocatedBlock allocateBlock(final long size, ReplicationType type, // TODO: #CLUTIL Remove creation logic when all replication types and // factors are handled by pipeline creator pipeline = pipelineManager.createPipeline(type, factor); + // wait until pipeline is ready + long current = System.currentTimeMillis(); + while (!pipeline.isOpen() && System.currentTimeMillis() < + (current + pipelineCreateWaitTimeout)) { + try { + Thread.sleep(1000); + } catch (InterruptedException e) { + } + } Review comment: This create pipeline in block allocation path is kind of debating. A current comment sys "TODO: #CLUTIL Remove creation logic when all replication types and factors are handled by pipeline creator". To the detail, "ALLOCATED" state will be handled in task HDDS-2177, "Add a srubber thread to detect creation failure pipelines in ALLOCATED state". Currently the pipelineCreateWaitTimeout is calculated based on "hdds.command.status.report.interval" and "hdds.heartbeat.interval", under the condition that the connection between Datanode and SCM is in good state. What if pipeline is created successfully, while the connection to SCM broken and restored after a while. Would we wait a little longer to decide whether pipeline creation success or failure. So in HDDS-2177, I plan to have a configurable property for the pipeline creation timeout. Every ALLOCATED pipeline, which exceeds the creation timeout will be claimed failure and garbage collected. Whether using CompleteFuture or while loop, we all need a timeout. This is on block allocation path, how many latency can a synchronous API tolerate? Maybe the best way is not create pipeline in such case if we can make sure there are enough pipelines to use after exited safe mode. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org