[jira] Updated: (HADOOP-3523) [HOD] If a job does not exist in Torque's list of jobs, HOD allocate on previously allocated directory fails.

Hemanth Yamijala (JIRA) Tue, 10 Jun 2008 00:54:37 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hemanth Yamijala updated HADOOP-3523:
-------------------------------------

    Attachment: 3523.patch

The attached patch fixes the issue described above. We now check for the exit 
code from qstat indicating that the job id is invalid (error code = 153) and 
treat that as equivalent to completed. By doing so, a previously allocated 
cluster who's cluster id is no longer present with Torque will continue to be 
auto-deallocated and allocated again. 

However, if any other torque error occurs, we treat that as an unknown case, 
and let the user handle the deallocation himself. 

> [HOD] If a job does not exist in Torque's list of jobs, HOD allocate on 
> previously allocated directory fails.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3523
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3523
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>    Affects Versions: 0.18.0
>            Reporter: Hemanth Yamijala
>            Assignee: Hemanth Yamijala
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: 3523.patch
>
>
> HADOOP-3483 addressed the issue where a dead cluster could be reallocated 
> without having to issue warnings to users to clean up the directory 
> themselves, provided the job is completed. It missed one case, where the job 
> no longer exists in the Torque queue. When tried in that case, HOD fails with 
> a bad error message:
> ERROR - qstat error: exit code: 153 | signal: False | core False
> CRITICAL - op: allocate hod-clusters/test 3 failed: <type 
> 'exceptions.TypeError'> 'NoneType' object is unsubscriptable
> This should be addressed to avoid user concerns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3523) [HOD] If a job does not exist in Torque's list of jobs, HOD allocate on previously allocated directory fails.

Reply via email to