[
https://issues.apache.org/jira/browse/HDDS-8074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephen O'Donnell resolved HDDS-8074.
-------------------------------------
Fix Version/s: 1.4.0
Resolution: Fixed
> Improve synchronization around command queue updates in Node Manager
> --------------------------------------------------------------------
>
> Key: HDDS-8074
> URL: https://issues.apache.org/jira/browse/HDDS-8074
> Project: Apache Ozone
> Issue Type: Sub-task
> Components: SCM
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.4.0
>
>
> The total commands pending for a datanode is the sum of the commands on the
> NodeManager CommandQueue and the number of commands the DN reported it has in
> the previous heartbeat.
> As things stand, these two piece of information come from two different
> methods, each with their own locking, the result is potentially inconsistent.
> To allow a consistent view of the commands queued on a data, this PR:
> 1. Adds a read write lock into the SCMNodeManager so it can lock around
> updates to the command queue, updating the DN queue count in heartbeat
> processing and querying the counts.
> 2. Moves the CommandQueueReportProcessing from being asynchronous to being
> processed as part of the heartbeat in SCM. This avoids a problem were the
> command queue has been emptied, but the pending count has not been updated
> inside DatanodeInfo.
> 3. In an earlier PR, a low priority flag was added to ReplicateContainer
> commands, so that the balancer can send commands with a lower priority. The
> DN does not report these low priority commands in its counts, so the command
> queue has been adjusted to not count them either.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]