[ 
https://issues.apache.org/jira/browse/HDDS-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Gui updated HDDS-6598:
---------------------------
    Description: 
After stressing a cluster for several days, we found that there are a lot of 
CLOSED EC pipelines.
{code:java}
[ozoneadmin@TENCENT64 ~/ozone-1.3.0-SNAPSHOT]$ ./bin/ozone admin pipeline list 
--state=CLOSED | wc -l
997 {code}
It makes commands return slowly(e.g. ozone admin datanode list, ozone admin 
pipeline list), and potentially it will add unnecessary burden to SCM HA, so 
these CLOSED EC pipelines should be cleaned up properly.

Several ways to consider:
 # We close pipelines in `WritableECContainerProvider` by calling 
`pipelineManager.closePipeline(pipeline, true);`, here the `true` means we 
don't remove the pipeline record until a timeout. But actually the remove only 
happens for Ratis Pipelines in `BackgrounePipelineCreator` when doing 
`pipelineManager.scrubPipeline(replicationConfig);`. We could make it to 
`false` then we'll get selected, CLOSED pipeline records removed, but leave the 
unselected CLOSED pipeline records there.
 # We could try to close pipeline after container close event from DN is 
received. But container close follows a lifecyle like: OPEN -> CLOSING -> 
QUASI_CLOSED -> CLOSED. I think it would be tricky to hook a pipeline close 
action after an EC container is closed.
 # We could have a dedicated background thread that runs periodically to 
cleanup the CLOSED pipelines in a batch. This also benefits SCM HA compared to 
solution 1 since we tends to do batch cleanups instead of one by one.

I think we could choose solution 3 to solve this problem.

 

  was:
After stressing a cluster for several days, we found that there are a lot of 
CLOSED EC pipelines.
{code:java}
[ozoneadmin@TENCENT64 ~/ozone-1.3.0-SNAPSHOT]$ ./bin/ozone admin pipeline list 
--state=CLOSED | wc -l
997 {code}
It makes commands return slowly(e.g. ozone admin datanode list, ozone admin 
pipeline list), and potentially it will add unnecessary burden to SCM HA, so 
these CLOSED EC pipelines should be cleaned up properly.

Several ways to consider:

- We close pipelines in `WritableECContainerProvider` by calling 
`pipelineManager.closePipeline(pipeline, true);`

 


> EC: EC pipeline records are not removed after close.
> ----------------------------------------------------
>
>                 Key: HDDS-6598
>                 URL: https://issues.apache.org/jira/browse/HDDS-6598
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Mark Gui
>            Assignee: Mark Gui
>            Priority: Major
>
> After stressing a cluster for several days, we found that there are a lot of 
> CLOSED EC pipelines.
> {code:java}
> [ozoneadmin@TENCENT64 ~/ozone-1.3.0-SNAPSHOT]$ ./bin/ozone admin pipeline 
> list --state=CLOSED | wc -l
> 997 {code}
> It makes commands return slowly(e.g. ozone admin datanode list, ozone admin 
> pipeline list), and potentially it will add unnecessary burden to SCM HA, so 
> these CLOSED EC pipelines should be cleaned up properly.
> Several ways to consider:
>  # We close pipelines in `WritableECContainerProvider` by calling 
> `pipelineManager.closePipeline(pipeline, true);`, here the `true` means we 
> don't remove the pipeline record until a timeout. But actually the remove 
> only happens for Ratis Pipelines in `BackgrounePipelineCreator` when doing 
> `pipelineManager.scrubPipeline(replicationConfig);`. We could make it to 
> `false` then we'll get selected, CLOSED pipeline records removed, but leave 
> the unselected CLOSED pipeline records there.
>  # We could try to close pipeline after container close event from DN is 
> received. But container close follows a lifecyle like: OPEN -> CLOSING -> 
> QUASI_CLOSED -> CLOSED. I think it would be tricky to hook a pipeline close 
> action after an EC container is closed.
>  # We could have a dedicated background thread that runs periodically to 
> cleanup the CLOSED pipelines in a batch. This also benefits SCM HA compared 
> to solution 1 since we tends to do batch cleanups instead of one by one.
> I think we could choose solution 3 to solve this problem.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to