[jira] [Assigned] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

Jason Lowe (JIRA) Tue, 06 Nov 2018 06:33:27 -0800


     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jason Lowe reassigned MAPREDUCE-7148:
-------------------------------------

    Assignee: Wang Yan  (was: Jason Lowe)

Thanks for updating the patch! Apologies for the delay in re-review, been very 
busy lately.

Just one last nit with the rework: there's now a lot of redundant code in 
reportError. All three code paths do the same thing with only a boolean 
difference for whether the job should fail. It would be easier to read and 
maintain if the number of code paths were reduced by computing whether or not 
the job should fast-fail in a boolean and then unconditionally call 
umbilical.fatalError, e.g.:
{code:java}
  boolean fastFailJob = false;
  [...]
  if (hasClusterStorageCapacityExceededException) {
    [...]
    if (killJobWhenExceedClusterStorageCapacity) {
       LOG.error(...)
       fastFailJob = true;
    }
  }
  umbilical.fatalError(..., fastFailJob);
{code}

> Fast fail jobs when exceeds dfs quota limitation
> ------------------------------------------------
>
>                 Key: MAPREDUCE-7148
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>    Affects Versions: 2.7.0, 2.8.0, 2.9.0
>         Environment: hadoop 2.7.3
>            Reporter: Wang Yan
>            Assignee: Wang Yan
>            Priority: Major
>         Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch, 
> MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch, MAPREDUCE-7148.005.patch, 
> MAPREDUCE-7148.006.patch, MAPREDUCE-7148.007.patch, MAPREDUCE-7148.008.patch, 
> MAPREDUCE-7148.009.patch
>
>
> We are running hive jobs with a DFS quota limitation per job(3TB). If a job 
> hits DFS quota limitation, the task that hit it will fail and there will be a 
> few task reties before the job actually fails. The retry is not very helpful 
> because the job will always fail anyway. In some worse cases, we have a job 
> which has a single reduce task writing more than 3TB to HDFS over 20 hours, 
> the reduce task exceeds the quota limitation and retries 4 times until the 
> job fails in the end thus consuming a lot of unnecessary resource. This 
> ticket aims at providing the feature to let a job fail fast when it writes 
> too much data to the DFS and exceeds the DFS quota limitation. The fast fail 
> feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

Reply via email to