[HOD] Include ringmaster RPC port information in the notes attribute
--------------------------------------------------------------------

                 Key: HADOOP-4937
                 URL: https://issues.apache.org/jira/browse/HADOOP-4937
             Project: Hadoop Core
          Issue Type: New Feature
          Components: contrib/hod
            Reporter: Hemanth Yamijala
            Assignee: Peeyush Bishnoi


In large cluster deployments, due to node failures, it sometimes happens that 
HOD clusters get allocated, but not deallocated even after the idleness limit 
of the cluster (the time for which no jobs are run) exceeds. One of the main 
reasons for this is the ringmaster process which is responsible for tracking 
and cleaning an idle cluster (of which it is a part) itself goes down. To 
handle such scenarios it makes sense to centrally track the ringmaster nodes 
for suspicious clusters. But since the information about which port the 
ringmaster is bound to is not centrally available, this becomes impossible to 
monitor.

This issue is an enhancement request to include ringmaster RPC port information 
along with the JT and NN info as part of the resource manager's notes attribute 
so that it can be used by any monitoring processes built around it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to