[jira] Commented: (HADOOP-3376) [HOD] HOD should have a way to detect and deal with clusters that violate/exceed resource manager limits

Karam Singh (JIRA) Mon, 02 Jun 2008 05:32:17 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601616#action_12601616
 ]


Karam Singh commented on HADOOP-3376:
-------------------------------------

To check this issue after setting MAXPROC limit (say 10) in maui.cfg did the 
following -:

Added line -: job-feasibility-attr = User-limits exceeded. Requested:([0-9]*) 
Used:([0-9]*) MaxLimit:([0-9]*) under section hod in hodrc. Hod requires exact 
string in hodrc

1. When tried to use hod allocate with number of nodes greater then MAXPROC 
limit (say 11). Verified that hod exits with exit code 4 and proper error 
message saying -:  CRITICAL/50 hadoop:216 - Requested number of nodes  exceeded 
maximum user limits. Current Usage:0, Requested:11, Maximum Limit:10 This 
cluster cannot be allocated now.

2. Tried a combination like first used hod allocate 5 nodes then again using 
hod allocate with 6 nodes. Verified that job got queued with message -:
CRITICAL/50 hadoop:216 - Requested number of nodes  exceeded maximum user 
limits. Current Usage:5, Requested:6, Maximum  Limit:10 This cluster allocation 
will succeed only after other clusters are deallocated.
Also checked after first cluster got deallocated then second cluster got 
allocated

Repeated with more hod allocate combinations


> [HOD] HOD should have a way to detect and deal with clusters that 
> violate/exceed resource manager limits
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3376
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3376
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: checklimits.sh, HADOOP-3376, HADOOP-3376.1, HADOOP-3376.2
>
>
> Currently If we set up resource manager/scheduler limits on the jobs 
> submitted, any HOD cluster that exceeds/violates these limits may 1) get 
> blocked/queued indefinitely or 2) blocked till resources occupied by old 
> clusters get freed. HOD should detect these scenarios and deal intelligently, 
> instead of just waiting for a long time/ for ever. This means more and proper 
> information to the submitter.
> (Internal) Use Case:
>      If there are no resource limits, users can flood the resource manager 
> queue preventing other users from using the queue. To avoid this, we could 
> have various types of limits setup in either resource manager or a scheduler 
> - max node limit in torque(per job limit), maxproc limit in maui (per 
> user/class), maxjob limit in maui(per user/class) etc. But there is one 
> problem with the current setup - for e.g if we set up maxproc limit in maui 
> to limit the aggregate number of nodes by any user over all jobs, 1) jobs get 
> queued indefinitely if jobs exceed max limit and 2) blocked if it asks for 
> nodes < max limit, but some of the resources are already used by jobs from 
> the same user. This issue addresses how to deal with scenarios like these.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3376) [HOD] HOD should have a way to detect and deal with clusters that violate/exceed resource manager limits

Reply via email to