[ 
https://issues.apache.org/jira/browse/HDDS-6292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-6292:
---------------------------------
    Labels: pull-request-available  (was: )

> Ensure immutable ContainerReplica set is returned from 
> ContainerStateManagerImpl
> --------------------------------------------------------------------------------
>
>                 Key: HDDS-6292
>                 URL: https://issues.apache.org/jira/browse/HDDS-6292
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: SCM
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>
> Inside ContainerStateMap, the replicas for a container are stored in a Set 
> backed by a ConcurrentHashMap.
> When you ask for the current replicas of a container, this method is used:
> {code:java}
> public Set<ContainerReplica> getContainerReplicas(
>       final ContainerID containerID) {
>     Preconditions.checkNotNull(containerID);
>     final Set<ContainerReplica> replicas = replicaMap.get(containerID);
>     return replicas == null ? null : Collections.unmodifiableSet(replicas);
> } {code}
> Note that it pulls out the Set, wraps it as unmodifiable and returns it.
> There is a problem here, in that if the Set is updated by ICR / FCR at the 
> same time as another part of the code has taken a reference to it, the other 
> part of the code can make incorrect decisions. Eg:
>  
> {code:java}
> Set<> replicas = getContainerReplicas(...)
> replicaCount = replicas.size()
> // continue to do something based on the size{code}
> ReplicationManger has run into a race condition like this. We also use the 
> Replicas to form pipelines for closed containers, so I worry there could be 
> some strange issues if the set if mutated during the pipeline creation.
> I see two possible solutions here. `GetContainerReplicas` should create a 
> copy of the Set and return that, so the copy the other part of the code gets 
> is its own copy and nothing can change it.
> Or, we make the Set immutable, so that each new replica details are received, 
> we create the new copy of the set and store that. Then any other parts of the 
> code can get a reference to it, and know it will never change.
> Mutations to the replicas for a closed container will only happen with FCR, 
> which is relatively rare.
> However we may ask for read pipelines very frequently, so it would be cheaper 
> overall to use option 2.
> It we go with option 2, I think we can move from a concurrentHashMap to a 
> plain hashMap too, which may make the memory footprint slightly smaller.
> Note access to the replicas is via ContainerStateManagerImpl, which already 
> has a course RW lock protecting access to the container manager. Quite 
> possibly FCR reporting could be improved by a finer grained or striped lock.
> This problem was reported in HDDS-5643.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to