Stephen O'Donnell created HDDS-6307:
---------------------------------------
Summary: Improve processing and memory efficiency of container
reports
Key: HDDS-6307
URL: https://issues.apache.org/jira/browse/HDDS-6307
Project: Apache Ozone
Issue Type: Improvement
Components: SCM
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell
The container report handing code has some issues which this Jira intends to
address.
When handling full container reports, we make several copies of large sets to
identify containers in SCM, but not reported by the DN (they have somehow got
lost on the DNs).
Looking at the current code:
{code}
synchronized (datanodeDetails) {
final List<ContainerReplicaProto> replicas =
containerReport.getReportsList();
final Set<ContainerID> containersInSCM =
nodeManager.getContainers(datanodeDetails);
final Set<ContainerID> containersInDn = replicas.parallelStream()
.map(ContainerReplicaProto::getContainerID)
.map(ContainerID::valueOf).collect(Collectors.toSet());
final Set<ContainerID> missingReplicas = new HashSet<>(containersInSCM);
missingReplicas.removeAll(containersInDn);
processContainerReplicas(datanodeDetails, replicas, publisher);
processMissingReplicas(datanodeDetails, missingReplicas);
/*
* Update the latest set of containers for this datanode in
* NodeManager
*/
nodeManager.setContainers(datanodeDetails, containersInDn);
containerManager.notifyContainerReportProcessing(true, true);
}
{code}
The Set "containersInSCM" comes from NodeStateMap:
{code}
public Set<ContainerID> getContainers(UUID uuid)
throws NodeNotFoundException {
lock.readLock().lock();
try {
checkIfNodeExist(uuid);
return Collections
.unmodifiableSet(new HashSet<>(nodeToContainer.get(uuid)));
} finally {
lock.readLock().unlock();
}
}
{code}
This returns a new copy of the set, so there is no need to wrap it
UnModifiable. This means we can avoid copying it again in the report Handler.
The current code ends up with 3 copies of this potentially large set.
Next we take the FCR and stream it into a set of ContainerID. This is used for
two purposes - 1, to subract from "containersInSCM" to yield the missing
containers. 2, to replace the list already in nodeMnager.
We can avoid this second large set, by simply adding each ContainerID to
nodeManager if it is not already there and then remove any "missing" from
nodeManager at the end. This also avoids replacing the entire set of
ContainersIDs and all the tenured objects associated with it with new
effectively identical object instances.
Finally we can process the replicas at the same time, and re-use the
ContainerID object, which we currently form 2 times. We store one copy in the
nodeManager, and then another distinct copy in each ContainerReplica.
I checked how many ContainerID objects are present in SCM, by loading 10 closed
containers with 3 replicas, and capturing a heap histogram with jmap:
{code}
bash-4.2$ jmap -histo:live 8 | grep ContainerID
262: 81 1944
org.apache.hadoop.hdds.scm.container.ContainerID
301: 30 1440
org.apache.hadoop.hdds.scm.container.ContainerReplica
419: 10 800
org.apache.hadoop.hdds.scm.container.ContainerInfo
{code}
There are 10 Containers as expected, 30 replicas, but 81 ContainerID objects.
Ideally there should be 10, to match the number of Containers, but some of
these may be created from other code paths, such as pipeline creation.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]