[
https://issues.apache.org/jira/browse/HDDS-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525255#comment-17525255
]
Stephen O'Donnell commented on HDDS-6598:
-----------------------------------------
I'm not sure about the best way to approach this. Ideally we would have one
thread that removes all closed pipelines that are ready to be removed, rather
than one for EC and one for Ratis.
It is a bit strange that the currently logic is called
BackgroundPipelineCreator and it also performs pipeline cleanup. However I see
it cleans up pipelines before it creates new ones. Does removing closed
pipelines allow for more new ones to be created? Checking the
RatisPipelineProvider, it seems to ignore CLOSED pipelines in the limits:
{code}
// Per datanode limit
if (maxPipelinePerDatanode > 0) {
return (getPipelineStateManager().getPipelines(replicationConfig).size() -
getPipelineStateManager().getPipelines(replicationConfig,
PipelineState.CLOSED).size()) > maxPipelinePerDatanode *
getNodeManager().getNodeCount(NodeStatus.inServiceHealthy()) /
replicationConfig.getRequiredNodes();
}
// Global limit
if (pipelineNumberLimit > 0) {
return (getPipelineStateManager().getPipelines(replicationConfig).size() -
getPipelineStateManager().getPipelines(
replicationConfig, PipelineState.CLOSED).size()) >
(pipelineNumberLimit - getPipelineStateManager()
.getPipelines(RatisReplicationConfig
.getInstance(ReplicationFactor.ONE))
.size());
}
{code}
So I don't think there is a need to have the closed pipelines removed as part
of the create.
I think I would be in favour of a new thread to remove all pipelines that
should be removed (Ratis, EC), and remove the current logic from the creation
thread. There may also be a time in the future were we want to close long lived
pipelines after some time, as it has been discussed before, so a new pipeline
closing / scrubber thread could handle that too, if it ever needed to.
> EC: EC pipeline records are not removed after close.
> ----------------------------------------------------
>
> Key: HDDS-6598
> URL: https://issues.apache.org/jira/browse/HDDS-6598
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Mark Gui
> Assignee: Mark Gui
> Priority: Major
>
> After stressing a cluster for several days, we found that there are a lot of
> CLOSED EC pipelines.
> {code:java}
> [ozoneadmin@TENCENT64 ~/ozone-1.3.0-SNAPSHOT]$ ./bin/ozone admin pipeline
> list --state=CLOSED | wc -l
> 997 {code}
> It makes commands return slowly(e.g. ozone admin datanode list, ozone admin
> pipeline list), and potentially it will add unnecessary burden to SCM HA, so
> these CLOSED EC pipelines should be cleaned up properly.
> Several ways to consider:
> # We close pipelines in `WritableECContainerProvider` by calling
> `pipelineManager.closePipeline(pipeline, true);`, here the `true` means we
> don't remove the pipeline record until a timeout. But actually the remove
> only happens for Ratis Pipelines in `BackgrounePipelineCreator` when doing
> `pipelineManager.scrubPipeline(replicationConfig);`. We could make it to
> `false` then we'll get selected, CLOSED pipeline records removed, but leave
> the unselected CLOSED pipeline records there.
> # We could try to close pipeline after container close event from DN is
> received. But container close follows a lifecyle like: OPEN -> CLOSING ->
> QUASI_CLOSED -> CLOSED. I think it would be tricky to hook a pipeline close
> action after an EC container is closed.
> # We could have a dedicated background thread that runs periodically to
> cleanup the CLOSED pipelines in a batch. This also benefits SCM HA compared
> to solution 1 since we tends to do batch cleanups instead of one by one.
> I think we could choose solution 3 to solve this problem.
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]