[ https://issues.apache.org/jira/browse/HADOOP-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669554#action_12669554 ]
Hemanth Yamijala commented on HADOOP-4938: ------------------------------------------ Peeyush, as we discussed, please make the following changes: - Pass options as command line parameters. I think this will be easier to manage for now. Look at how logcondense.py works. - The state file and the log file locations should be configurable. Default can be /tmp and /var/log - The code is checking the sum of runningJobs and submittedJobs is < the number stored in the state file. Since submittedJobs already includes runningJobs, you don't need to sum them up. - The SMTP recepient address should be configurable. Also does the library you are using support multiple addresses and a remote SMTP host ? - Submit this as a patch, I think the file should be under the $HOD_HOME/support. - Include the ASF header in the file. - Can you also submit documentation for this in Forrest ? > [HOD] Cleanup idle HOD clusters whose ringmaster nodes might have gone down > --------------------------------------------------------------------------- > > Key: HADOOP-4938 > URL: https://issues.apache.org/jira/browse/HADOOP-4938 > Project: Hadoop Core > Issue Type: Improvement > Components: contrib/hod > Reporter: Hemanth Yamijala > Assignee: Peeyush Bishnoi > Attachments: externalIdleTracker.py > > > As mentioned in HADOOP-4937, sometimes in large cluster deployments, faulty > nodes on which the ringmaster process comes up may go down after the cluster > is successfully allocated. Such clusters fail to deallocate automatically > even if the idleness limit of the cluster is exceeded. This is because the > idleness is tracked by the ringmaster process which itself has gone down. > As large number of nodes can get held up due to this, such clusters should be > detected and deallocated in some manner. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.