[
https://issues.apache.org/jira/browse/HADOOP-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peeyush Bishnoi updated HADOOP-4938:
------------------------------------
Attachment: externalIdleTracker.py
The script to detect the idle cluster has been attached for review. For running
this script HOD_CONF_DIR should be set or path to hod conf dir should be
specified with "-c" or "--conf" options along with script name.
For sending the mail notification to the administrator if cluster is still
alive even after deallocattion, mail daemon i.e SMTP should be configured on
the machine on which this script will run .
---
> [HOD] Cleanup idle HOD clusters whose ringmaster nodes might have gone down
> ---------------------------------------------------------------------------
>
> Key: HADOOP-4938
> URL: https://issues.apache.org/jira/browse/HADOOP-4938
> Project: Hadoop Core
> Issue Type: Improvement
> Components: contrib/hod
> Reporter: Hemanth Yamijala
> Assignee: Peeyush Bishnoi
> Attachments: externalIdleTracker.py
>
>
> As mentioned in HADOOP-4937, sometimes in large cluster deployments, faulty
> nodes on which the ringmaster process comes up may go down after the cluster
> is successfully allocated. Such clusters fail to deallocate automatically
> even if the idleness limit of the cluster is exceeded. This is because the
> idleness is tracked by the ringmaster process which itself has gone down.
> As large number of nodes can get held up due to this, such clusters should be
> detected and deallocated in some manner.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.