[
https://issues.apache.org/jira/browse/HADOOP-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hemanth Yamijala updated HADOOP-2847:
-------------------------------------
Attachment: hadoop-2847
This patch adds some error handling around code which calls the hadoop client
to determine number of running jobs. If an exception is thrown here, typically
due to SocketTimeout or SocketException, the error code from the hadoop client
is captured and used to determine idleness time.
> [HOD] Idle cluster cleanup does not work if the JobTracker becomes
> unresponsive to RPC calls
> --------------------------------------------------------------------------------------------
>
> Key: HADOOP-2847
> URL: https://issues.apache.org/jira/browse/HADOOP-2847
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/hod
> Affects Versions: 0.16.0
> Reporter: Hemanth Yamijala
> Assignee: Hemanth Yamijala
> Priority: Blocker
> Fix For: 0.16.1
>
> Attachments: hadoop-2847
>
>
> In some erroneous conditions, the Hadoop JobTracker becomes unresponsive to
> RPC calls (for e.g. if a misconfiguration causes the JobTracker to run out of
> memory). In such cases, a cluster allocated by HOD no longer runs any jobs
> and is wastefully holding up nodes. The usual idle cluster cleaner should
> deallocate the cluster ideally, but it does not.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.