[
https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe reassigned MAPREDUCE-7148:
-------------------------------------
Assignee: Wang Yan (was: Jason Lowe)
Thanks for updating the patch! Apologies for the delay in re-review, been very
busy lately.
Just one last nit with the rework: there's now a lot of redundant code in
reportError. All three code paths do the same thing with only a boolean
difference for whether the job should fail. It would be easier to read and
maintain if the number of code paths were reduced by computing whether or not
the job should fast-fail in a boolean and then unconditionally call
umbilical.fatalError, e.g.:
{code:java}
boolean fastFailJob = false;
[...]
if (hasClusterStorageCapacityExceededException) {
[...]
if (killJobWhenExceedClusterStorageCapacity) {
LOG.error(...)
fastFailJob = true;
}
}
umbilical.fatalError(..., fastFailJob);
{code}
> Fast fail jobs when exceeds dfs quota limitation
> ------------------------------------------------
>
> Key: MAPREDUCE-7148
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: task
> Affects Versions: 2.7.0, 2.8.0, 2.9.0
> Environment: hadoop 2.7.3
> Reporter: Wang Yan
> Assignee: Wang Yan
> Priority: Major
> Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch,
> MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch, MAPREDUCE-7148.005.patch,
> MAPREDUCE-7148.006.patch, MAPREDUCE-7148.007.patch, MAPREDUCE-7148.008.patch,
> MAPREDUCE-7148.009.patch
>
>
> We are running hive jobs with a DFS quota limitation per job(3TB). If a job
> hits DFS quota limitation, the task that hit it will fail and there will be a
> few task reties before the job actually fails. The retry is not very helpful
> because the job will always fail anyway. In some worse cases, we have a job
> which has a single reduce task writing more than 3TB to HDFS over 20 hours,
> the reduce task exceeds the quota limitation and retries 4 times until the
> job fails in the end thus consuming a lot of unnecessary resource. This
> ticket aims at providing the feature to let a job fail fast when it writes
> too much data to the DFS and exceeds the DFS quota limitation. The fast fail
> feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 .
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]