[jira] [Updated] (HDDS-6307) Improve processing and memory efficiency of container reports

ASF GitHub Bot (Jira) Mon, 14 Feb 2022 05:19:04 -0800


     [ 
https://issues.apache.org/jira/browse/HDDS-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated HDDS-6307:
---------------------------------
    Labels: pull-request-available  (was: )

> Improve processing and memory efficiency of container reports
> -------------------------------------------------------------
>
>                 Key: HDDS-6307
>                 URL: https://issues.apache.org/jira/browse/HDDS-6307
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: SCM
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>
> The container report handing code has some issues which this Jira intends to 
> address.
> When handling full container reports, we make several copies of large sets to 
> identify containers in SCM, but not reported by the DN (they have somehow got 
> lost on the DNs).
> Looking at the current code:
> {code}
>       synchronized (datanodeDetails) {
>         final List<ContainerReplicaProto> replicas =
>             containerReport.getReportsList();
>         final Set<ContainerID> containersInSCM =
>             nodeManager.getContainers(datanodeDetails);
>         final Set<ContainerID> containersInDn = replicas.parallelStream()
>             .map(ContainerReplicaProto::getContainerID)
>             .map(ContainerID::valueOf).collect(Collectors.toSet());
>         final Set<ContainerID> missingReplicas = new 
> HashSet<>(containersInSCM);
>         missingReplicas.removeAll(containersInDn);
>         processContainerReplicas(datanodeDetails, replicas, publisher);
>         processMissingReplicas(datanodeDetails, missingReplicas);
>         /*
>          * Update the latest set of containers for this datanode in
>          * NodeManager
>          */
>         nodeManager.setContainers(datanodeDetails, containersInDn);
>         containerManager.notifyContainerReportProcessing(true, true);
>       }
> {code}
> The Set "containersInSCM" comes from NodeStateMap:
> {code}
>   public Set<ContainerID> getContainers(UUID uuid)
>       throws NodeNotFoundException {
>     lock.readLock().lock();
>     try {
>       checkIfNodeExist(uuid);
>       return Collections
>           .unmodifiableSet(new HashSet<>(nodeToContainer.get(uuid)));
>     } finally {
>       lock.readLock().unlock();
>     }
>   }
> {code}
> This returns a new copy of the set, so there is no need to wrap it 
> UnModifiable. This means we can avoid copying it again in the report Handler. 
> The current code ends up with 3 copies of this potentially large set.
> Next we take the FCR and stream it into a set of ContainerID. This is used 
> for two purposes - 1, to subract from "containersInSCM" to yield the missing 
> containers. 2, to replace the list already in nodeMnager.
> We can avoid this second large set, by simply adding each ContainerID to 
> nodeManager if it is not already there and then remove any "missing" from 
> nodeManager at the end. This also avoids replacing the entire set of 
> ContainersIDs and all the tenured objects associated with it with new 
> effectively identical object instances.
> Finally we can process the replicas at the same time, and re-use the 
> ContainerID object, which we currently form 2 times. We store one copy in the 
> nodeManager, and then another distinct copy in each ContainerReplica.
> I checked how many ContainerID objects are present in SCM, by loading 10 
> closed containers with 3 replicas, and capturing a heap histogram with jmap:
> {code}
> bash-4.2$ jmap -histo:live 8 | grep ContainerID
>  262:            81           1944  
> org.apache.hadoop.hdds.scm.container.ContainerID
>  301:            30           1440  
> org.apache.hadoop.hdds.scm.container.ContainerReplica
>  419:            10            800  
> org.apache.hadoop.hdds.scm.container.ContainerInfo
> {code}
> There are 10 Containers as expected, 30 replicas, but 81 ContainerID objects. 
> Ideally there should be 10, to match the number of Containers, but some of 
> these may be created from other code paths, such as pipeline creation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-6307) Improve processing and memory efficiency of container reports

Reply via email to