[jira] [Commented] (HDDS-2113) Update JMX metrics in SCMNodeMetrics for Decommission and Maintenance

Stephen O'Donnell (Jira) Wed, 11 Sep 2019 10:30:06 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927840#comment-16927840
 ]


Stephen O'Donnell commented on HDDS-2113:
-----------------------------------------

[~nandakumar131] [~anu] [~arpaga] I would appreciate your thoughts on this.

> Update JMX metrics in SCMNodeMetrics for Decommission and Maintenance
> ---------------------------------------------------------------------
>
>                 Key: HDDS-2113
>                 URL: https://issues.apache.org/jira/browse/HDDS-2113
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: SCM
>    Affects Versions: 0.5.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> Currently the class SCMNodeMetrics exposes JMX metrics for the number of 
> HEALTHY, STALE and DEAD nodes.
> It also exposes the disk capacity of the cluster and the amount of space used 
> and available.
> We need to decide how we want to display things in JMX when nodes are in and 
> entering maintenance, decommissioning and decommissioned.
> We now have 15 states rather than the previous 3, as we can have nodes in:
>  * IN_SERVICE
>  * ENTERING_MAINTENANCE
>  * IN_MAINTENANCE
>  * DECOMMISSIONING
>  * DECOMMISSIONED
> And in each of these states, nodes can be:
>  * HEALTHY
>  * STALE
>  * DEAD
> The simplest case would be to expose these 15 states directly in JMX, as it 
> gives the complete picture, but I wonder if we need any summary JMX metrics 
> too?
>  
> We also need to consider how to count disk capacity and usage. For example:
>  # Do we count capacity and usage on a DECOMMISSIONING node? This is not a 
> clear cut answer, as a decommissioning node does not provide any capacity for 
> writers in the cluster, but it does use capacity.
>  # For a DECOMMISSIONED node, we probably should not count capacity or usage
>  # For an ENTERING_MAINTENANCE node, do we count capacity and usage? I 
> suspect we should include the capacity and usage in the totals, however a 
> node in this state will not be available for writes.
>  # For an IN_MAINTENANCE node that is healthy?
>  # For an IN_MAINTENANCE node that is dead?
> I would welcome any thoughts on this before changing the code.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-2113) Update JMX metrics in SCMNodeMetrics for Decommission and Maintenance

Reply via email to