Stephen O'Donnell created HDDS-6307:
---------------------------------------

             Summary: Improve processing and memory efficiency of container 
reports
                 Key: HDDS-6307
                 URL: https://issues.apache.org/jira/browse/HDDS-6307
             Project: Apache Ozone
          Issue Type: Improvement
          Components: SCM
            Reporter: Stephen O'Donnell
            Assignee: Stephen O'Donnell


The container report handing code has some issues which this Jira intends to 
address.

When handling full container reports, we make several copies of large sets to 
identify containers in SCM, but not reported by the DN (they have somehow got 
lost on the DNs).

Looking at the current code:

{code}
      synchronized (datanodeDetails) {
        final List<ContainerReplicaProto> replicas =
            containerReport.getReportsList();
        final Set<ContainerID> containersInSCM =
            nodeManager.getContainers(datanodeDetails);

        final Set<ContainerID> containersInDn = replicas.parallelStream()
            .map(ContainerReplicaProto::getContainerID)
            .map(ContainerID::valueOf).collect(Collectors.toSet());

        final Set<ContainerID> missingReplicas = new HashSet<>(containersInSCM);
        missingReplicas.removeAll(containersInDn);

        processContainerReplicas(datanodeDetails, replicas, publisher);
        processMissingReplicas(datanodeDetails, missingReplicas);

        /*
         * Update the latest set of containers for this datanode in
         * NodeManager
         */
        nodeManager.setContainers(datanodeDetails, containersInDn);

        containerManager.notifyContainerReportProcessing(true, true);
      }
{code}

The Set "containersInSCM" comes from NodeStateMap:

{code}
  public Set<ContainerID> getContainers(UUID uuid)
      throws NodeNotFoundException {
    lock.readLock().lock();
    try {
      checkIfNodeExist(uuid);
      return Collections
          .unmodifiableSet(new HashSet<>(nodeToContainer.get(uuid)));
    } finally {
      lock.readLock().unlock();
    }
  }
{code}

This returns a new copy of the set, so there is no need to wrap it 
UnModifiable. This means we can avoid copying it again in the report Handler. 
The current code ends up with 3 copies of this potentially large set.

Next we take the FCR and stream it into a set of ContainerID. This is used for 
two purposes - 1, to subract from "containersInSCM" to yield the missing 
containers. 2, to replace the list already in nodeMnager.

We can avoid this second large set, by simply adding each ContainerID to 
nodeManager if it is not already there and then remove any "missing" from 
nodeManager at the end. This also avoids replacing the entire set of 
ContainersIDs and all the tenured objects associated with it with new 
effectively identical object instances.

Finally we can process the replicas at the same time, and re-use the 
ContainerID object, which we currently form 2 times. We store one copy in the 
nodeManager, and then another distinct copy in each ContainerReplica.

I checked how many ContainerID objects are present in SCM, by loading 10 closed 
containers with 3 replicas, and capturing a heap histogram with jmap:

{code}
bash-4.2$ jmap -histo:live 8 | grep ContainerID
 262:            81           1944  
org.apache.hadoop.hdds.scm.container.ContainerID
 301:            30           1440  
org.apache.hadoop.hdds.scm.container.ContainerReplica
 419:            10            800  
org.apache.hadoop.hdds.scm.container.ContainerInfo
{code}

There are 10 Containers as expected, 30 replicas, but 81 ContainerID objects. 
Ideally there should be 10, to match the number of Containers, but some of 
these may be created from other code paths, such as pipeline creation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to