Stephen O'Donnell created HDDS-8074:
---------------------------------------

             Summary: Improve synchronization around command queue updates in 
Node Manager
                 Key: HDDS-8074
                 URL: https://issues.apache.org/jira/browse/HDDS-8074
             Project: Apache Ozone
          Issue Type: Sub-task
          Components: SCM
            Reporter: Stephen O'Donnell
            Assignee: Stephen O'Donnell


The total commands pending for a datanode is the sum of the commands on the 
NodeManager CommandQueue and the number of commands the DN reported it has in 
the previous heartbeat.

As things stand, these two piece of information come from two different 
methods, each with their own locking, the result is potentially inconsistent.

To allow a consistent view of the commands queued on a data, this PR:

1. Adds a read write lock into the SCMNodeManager so it can lock around updates 
to the command queue, updating the DN queue count in heartbeat processing and 
querying the counts.

2. Moves the CommandQueueReportProcessing from being asynchronous to being 
processed as part of the heartbeat in SCM. This avoids a problem were the 
command queue has been emptied, but the pending count has not been updated 
inside DatanodeInfo.

3. In an earlier PR, a low priority flag was added to ReplicateContainer 
commands, so that the balancer can send commands with a lower priority. The DN 
does not report these low priority commands in its counts, so the command queue 
has been adjusted to not count them either.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to