[ 
https://issues.apache.org/jira/browse/HDDS-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDDS-9151:
------------------------------------
    Description: 
In testing we have found an issues in the ECWritableContainerProvider.

For EC a pipeline is used for only one container, when the container gets 
closed, the pipeline also gets closed. At the moment, the only place in the 
code which closes the EC piplines which no longer have an open container is 
inside the ECWritableContainerProvider. It first gets the list of open piplines 
and enforces the pipeline limit, then for all open pipelines, it tries top find 
one the client can use.

If the client has had problems writing to the pipelines (eg it was given a 
container/pipeline and then the write failed as the container was closed on the 
DN), the pipelines get added to the exclude list. Then we can get into a 
situation where many pipelines need to be closed on the write path, slowing 
down block allocation. 

Ideally, when a container transitions to CLOSING in SCM, if the container is an 
EC container, we should also close the associated pipeline to avoid it counting 
toward the limit and to avoid needing to close it during the write (block 
allocation) path.

This could be achieved relatively simply inside the 
PipelineManagerImpl.removeContainersFromPipeline() method which is called as 
soon as the container transitions to CLOSING via 
ContainerStateManagerImpl.updateContainerState() when it executes the 
containerStateChangeActions. Wrapping the container close and pipeline close in 
a lock inside PipelineManagerImpl ensure we have a consistent "ec container 
close" flow and it should avoid the ECWritableContainerProvider needing to 
close the pipelines internally. However we can leave that code in place in 
ECWritableContainerProvider incase some pipelines slip through somehow.

  was:
In testing we have found an issues in the ECWritableContainerProvider.

For EC a pipeline is used for only one container, when the container gets 
closed, the pipeline also gets closed. However closing the container can take 
some time. First it is marked as CLOSING in SCM, then SCM sends commands to the 
DNs to close it, and finally the container gets CLOSED.

As we limit the number of pipelines in SCM, containers in a CLOSING state mean 
there are containers/pipelines which are effectively closed, but the pipeline 
are still counted toward the limit.

Ideally, when a container transitions to CLOSING in SCM, if the container is an 
EC container, we should also close the associated pipeline to avoid it counting 
toward the limit.

This could be achieved relatively simply inside the 
PipelineManagerImpl.removeContainersFromPipeline() method which is called as 
soon as the container transitions to CLOSING via 
ContainerStateManagerImpl.updateContainerState() when it executes the 
containerStateChangeActions.


> Close EC Pipeline when container transitions to closing
> -------------------------------------------------------
>
>                 Key: HDDS-9151
>                 URL: https://issues.apache.org/jira/browse/HDDS-9151
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> In testing we have found an issues in the ECWritableContainerProvider.
> For EC a pipeline is used for only one container, when the container gets 
> closed, the pipeline also gets closed. At the moment, the only place in the 
> code which closes the EC piplines which no longer have an open container is 
> inside the ECWritableContainerProvider. It first gets the list of open 
> piplines and enforces the pipeline limit, then for all open pipelines, it 
> tries top find one the client can use.
> If the client has had problems writing to the pipelines (eg it was given a 
> container/pipeline and then the write failed as the container was closed on 
> the DN), the pipelines get added to the exclude list. Then we can get into a 
> situation where many pipelines need to be closed on the write path, slowing 
> down block allocation. 
> Ideally, when a container transitions to CLOSING in SCM, if the container is 
> an EC container, we should also close the associated pipeline to avoid it 
> counting toward the limit and to avoid needing to close it during the write 
> (block allocation) path.
> This could be achieved relatively simply inside the 
> PipelineManagerImpl.removeContainersFromPipeline() method which is called as 
> soon as the container transitions to CLOSING via 
> ContainerStateManagerImpl.updateContainerState() when it executes the 
> containerStateChangeActions. Wrapping the container close and pipeline close 
> in a lock inside PipelineManagerImpl ensure we have a consistent "ec 
> container close" flow and it should avoid the ECWritableContainerProvider 
> needing to close the pipelines internally. However we can leave that code in 
> place in ECWritableContainerProvider incase some pipelines slip through 
> somehow.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to