[ 
https://issues.apache.org/jira/browse/HADOOP-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peeyush Bishnoi updated HADOOP-4937:
------------------------------------

    Attachment: hadoop-4937.txt

Patch  attached will populate the resource manager 'notes' attribute with 
ringmaster RPC port information so that ringmaster information is centrally 
available .

---

> [HOD] Include ringmaster RPC port information in the notes attribute
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4937
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4937
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hod
>            Reporter: Hemanth Yamijala
>            Assignee: Peeyush Bishnoi
>         Attachments: hadoop-4937.txt
>
>
> In large cluster deployments, due to node failures, it sometimes happens that 
> HOD clusters get allocated, but not deallocated even after the idleness limit 
> of the cluster (the time for which no jobs are run) exceeds. One of the main 
> reasons for this is the ringmaster process which is responsible for tracking 
> and cleaning an idle cluster (of which it is a part) itself goes down. To 
> handle such scenarios it makes sense to centrally track the ringmaster nodes 
> for suspicious clusters. But since the information about which port the 
> ringmaster is bound to is not centrally available, this becomes impossible to 
> monitor.
> This issue is an enhancement request to include ringmaster RPC port 
> information along with the JT and NN info as part of the resource manager's 
> notes attribute so that it can be used by any monitoring processes built 
> around it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to