[
https://issues.apache.org/jira/browse/HDDS-6292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HDDS-6292:
---------------------------------
Labels: pull-request-available (was: )
> Ensure immutable ContainerReplica set is returned from
> ContainerStateManagerImpl
> --------------------------------------------------------------------------------
>
> Key: HDDS-6292
> URL: https://issues.apache.org/jira/browse/HDDS-6292
> Project: Apache Ozone
> Issue Type: Improvement
> Components: SCM
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
> Labels: pull-request-available
>
> Inside ContainerStateMap, the replicas for a container are stored in a Set
> backed by a ConcurrentHashMap.
> When you ask for the current replicas of a container, this method is used:
> {code:java}
> public Set<ContainerReplica> getContainerReplicas(
> final ContainerID containerID) {
> Preconditions.checkNotNull(containerID);
> final Set<ContainerReplica> replicas = replicaMap.get(containerID);
> return replicas == null ? null : Collections.unmodifiableSet(replicas);
> } {code}
> Note that it pulls out the Set, wraps it as unmodifiable and returns it.
> There is a problem here, in that if the Set is updated by ICR / FCR at the
> same time as another part of the code has taken a reference to it, the other
> part of the code can make incorrect decisions. Eg:
>
> {code:java}
> Set<> replicas = getContainerReplicas(...)
> replicaCount = replicas.size()
> // continue to do something based on the size{code}
> ReplicationManger has run into a race condition like this. We also use the
> Replicas to form pipelines for closed containers, so I worry there could be
> some strange issues if the set if mutated during the pipeline creation.
> I see two possible solutions here. `GetContainerReplicas` should create a
> copy of the Set and return that, so the copy the other part of the code gets
> is its own copy and nothing can change it.
> Or, we make the Set immutable, so that each new replica details are received,
> we create the new copy of the set and store that. Then any other parts of the
> code can get a reference to it, and know it will never change.
> Mutations to the replicas for a closed container will only happen with FCR,
> which is relatively rare.
> However we may ask for read pipelines very frequently, so it would be cheaper
> overall to use option 2.
> It we go with option 2, I think we can move from a concurrentHashMap to a
> plain hashMap too, which may make the memory footprint slightly smaller.
> Note access to the replicas is via ContainerStateManagerImpl, which already
> has a course RW lock protecting access to the container manager. Quite
> possibly FCR reporting could be improved by a finer grained or striped lock.
> This problem was reported in HDDS-5643.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]