[
https://issues.apache.org/jira/browse/HADOOP-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peeyush Bishnoi updated HADOOP-4938:
------------------------------------
Attachment: hadoop-4938.txt
Thanks! Hemanth for review the script and tto provide the valuable suggestions.
I am attaching the externalidletracker.py script as patch after incorporating
all your suggestions.
This patch support multiple recipients if we add "Cc" parameter in the patch .
Also it can connect to remote SMTP server for sending the mails.
Comments ?
---
> [HOD] Cleanup idle HOD clusters whose ringmaster nodes might have gone down
> ---------------------------------------------------------------------------
>
> Key: HADOOP-4938
> URL: https://issues.apache.org/jira/browse/HADOOP-4938
> Project: Hadoop Core
> Issue Type: Improvement
> Components: contrib/hod
> Reporter: Hemanth Yamijala
> Assignee: Peeyush Bishnoi
> Attachments: externalIdleTracker.py, hadoop-4938.txt
>
>
> As mentioned in HADOOP-4937, sometimes in large cluster deployments, faulty
> nodes on which the ringmaster process comes up may go down after the cluster
> is successfully allocated. Such clusters fail to deallocate automatically
> even if the idleness limit of the cluster is exceeded. This is because the
> idleness is tracked by the ringmaster process which itself has gone down.
> As large number of nodes can get held up due to this, such clusters should be
> detected and deallocated in some manner.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.