[ 
https://issues.apache.org/jira/browse/HDDS-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524216#comment-17524216
 ] 

Mark Gui commented on HDDS-6598:
--------------------------------

Yes, the `BackgroundPipelineCreator` acts as a background service to create 
Ratis pipelines, and {*}at the same time{*}, it cleans up CLOSED pipelines in 
the method #createPipelines.

The method works like: for each ReplicationConfig(ratis only, no EC), first 
cleanup CLOSED pipelines(and those in ALLOCATED state for too long) and then 
create new ones.

I'm wondering if it is good to inject EC handlings into the 
`BackgroundPipelineCreator`, because EC pipelines are created on demand ?

> EC: EC pipeline records are not removed after close.
> ----------------------------------------------------
>
>                 Key: HDDS-6598
>                 URL: https://issues.apache.org/jira/browse/HDDS-6598
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Mark Gui
>            Assignee: Mark Gui
>            Priority: Major
>
> After stressing a cluster for several days, we found that there are a lot of 
> CLOSED EC pipelines.
> {code:java}
> [ozoneadmin@TENCENT64 ~/ozone-1.3.0-SNAPSHOT]$ ./bin/ozone admin pipeline 
> list --state=CLOSED | wc -l
> 997 {code}
> It makes commands return slowly(e.g. ozone admin datanode list, ozone admin 
> pipeline list), and potentially it will add unnecessary burden to SCM HA, so 
> these CLOSED EC pipelines should be cleaned up properly.
> Several ways to consider:
>  # We close pipelines in `WritableECContainerProvider` by calling 
> `pipelineManager.closePipeline(pipeline, true);`, here the `true` means we 
> don't remove the pipeline record until a timeout. But actually the remove 
> only happens for Ratis Pipelines in `BackgrounePipelineCreator` when doing 
> `pipelineManager.scrubPipeline(replicationConfig);`. We could make it to 
> `false` then we'll get selected, CLOSED pipeline records removed, but leave 
> the unselected CLOSED pipeline records there.
>  # We could try to close pipeline after container close event from DN is 
> received. But container close follows a lifecyle like: OPEN -> CLOSING -> 
> QUASI_CLOSED -> CLOSED. I think it would be tricky to hook a pipeline close 
> action after an EC container is closed.
>  # We could have a dedicated background thread that runs periodically to 
> cleanup the CLOSED pipelines in a batch. This also benefits SCM HA compared 
> to solution 1 since we tends to do batch cleanups instead of one by one.
> I think we could choose solution 3 to solve this problem.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to