sodonnel opened a new pull request #3085:
URL: https://github.com/apache/ozone/pull/3085


   ## What changes were proposed in this pull request?
   
   The container report handing code has some issues which this Jira intends to 
address.
   
   When handling full container reports, we make several copies of large sets 
to identify containers in SCM, but not reported by the DN (they have somehow 
got lost on the DNs).
   
   Looking at the current code:
   ```
         synchronized (datanodeDetails) {
           final List<ContainerReplicaProto> replicas =
               containerReport.getReportsList();
           final Set<ContainerID> containersInSCM =
               nodeManager.getContainers(datanodeDetails);
   
           final Set<ContainerID> containersInDn = replicas.parallelStream()
               .map(ContainerReplicaProto::getContainerID)
               .map(ContainerID::valueOf).collect(Collectors.toSet());
   
           final Set<ContainerID> missingReplicas = new 
HashSet<>(containersInSCM);
           missingReplicas.removeAll(containersInDn);
   
           processContainerReplicas(datanodeDetails, replicas, publisher);
           processMissingReplicas(datanodeDetails, missingReplicas);
   
           /*
            * Update the latest set of containers for this datanode in
            * NodeManager
            */
           nodeManager.setContainers(datanodeDetails, containersInDn);
   
           containerManager.notifyContainerReportProcessing(true, true);
         }
   ```
   The Set "containersInSCM" comes from NodeStateMap:
   ```
     public Set<ContainerID> getContainers(UUID uuid)
         throws NodeNotFoundException {
       lock.readLock().lock();
       try {
         checkIfNodeExist(uuid);
         return Collections
             .unmodifiableSet(new HashSet<>(nodeToContainer.get(uuid)));
       } finally {
         lock.readLock().unlock();
       }
     }
   ```
   This returns a new copy of the set, so there is no need to wrap it 
UnModifiable. This means we can avoid copying it again in the report Handler. 
The current code ends up with 3 copies of this potentially large set.
   
   Next we take the FCR and stream it into a set of ContainerID. This is used 
for two purposes - 1, to subtract from "containersInSCM" to yield the missing 
containers. 2, to replace the list already in nodeManager.
   
   We can avoid this second large set, by simply adding each ContainerID to 
nodeManager if it is not already there and then remove any "missing" from 
nodeManager at the end. This also avoids replacing the entire set of 
ContainersIDs and all the tenured objects associated with it with new 
effectively identical object instances.
   
   Finally we can process the replicas at the same time, and re-use the 
ContainerID object, which we currently form 2 times. We store one copy in the 
nodeManager, and then another distinct copy in each ContainerReplica.
   
   I checked how many ContainerID objects are present in SCM, by loading 10 
closed containers with 3 replicas, and capturing a heap histogram with jmap:
   
   ```
   bash-4.2$ jmap -histo:live 8 | grep ContainerID
    262:            81           1944  
org.apache.hadoop.hdds.scm.container.ContainerID
    301:            30           1440  
org.apache.hadoop.hdds.scm.container.ContainerReplica
    419:            10            800  
org.apache.hadoop.hdds.scm.container.ContainerInfo
   ```
   There are 10 Containers as expected, 30 replicas, but 81 ContainerID 
objects. Ideally there should be 10, to match the number of Containers, but 
some of these may be created from other code paths, such as pipeline creation.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-6307
   
   ## How was this patch tested?
   
   Existing tests cover the functionality.
   
   For performance testing, I used the existing Freon SCMThroughputBenchmark 
class. With it, I simulated FCR handing for a single DN, reporting 10k, 20k and 
30k containers. The original code performed approximately:
   
   ```
   10k: 750 - 800ms per report.
   20k: 1550 - 1650ms per report.
   30k: 2300 - 2500ms per report.
   ```
   
   The code in this PR:
   
   ```
   10k: 530 - 570ms per report.
   20k: 1100- 1200ms per report.
   30k: 1550 - 1650ms per report.
   ```
   
   Both scale proportionally with the number of containers, but the revised 
code is about 30% faster and has a reduced memory overhead.
   
   After loading 30k containers with 1 replica each I checked the ContainerID 
object count using `jmap -histo:live`:
   
   ```
   Original:
     13:        120000        2880000  
org.apache.hadoop.hdds.scm.container.ContainerID
   
   Revised:
     37:         30000         720000  
org.apache.hadoop.hdds.scm.container.ContainerID
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to