[
https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wang Yan updated MAPREDUCE-7148:
--------------------------------
Description: We are running hive jobs with a DFS quota limitation per
job(3TB). If a job hits DFS quota limitation, the task that hit it will fail
and there will be a few task reties before the job actually fails. The retry is
not very helpful because the job will always fail anyway. In some worse cases,
we have a job which has a single reduce task writing more than 3TB to HDFS over
20 hours, the reduce task exceeds the quota limitation and retries 4 times
until the job fails in the end thus consuming a lot of unnecessary resource.
This ticket aims at providing the feature to let a job fail fast when it writes
too much data to the DFS and exceeds the DFS quota limitation. The fast fail
feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 . (was: We are
running hive jobs with a DFS quota limitation per job(3TB). If a job hits DFS
quota limitation, the task that hit it will fail and there will be a few task
reties before the job actually fails. The retry is not very helpful because the
job will always fail anyway. In some worse cases, we have a job which has a
single reduce task writing more than 3TB to HDFS over 20 hours, the reduce task
meets the quota limitation and retries 4 times until the job fail in the end
thus consuming a lot of unnecessary resource. This ticket aims at providing the
feature to let a job fail fast when it writes too much data to the DFS and
exceeds the DFS quota limitation. The fast fail feature is introduced in
MAPREDUCE-7022 and MAPREDUCE-6489 .)
> Fast fail jobs when exceeds dfs quota limitation
> ------------------------------------------------
>
> Key: MAPREDUCE-7148
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: task
> Affects Versions: 2.7.0, 2.8.0, 2.9.0
> Environment: hadoop 2.7.3
> Reporter: Wang Yan
> Priority: Major
> Attachments: MAPREDUCE-7148.001.patch
>
>
> We are running hive jobs with a DFS quota limitation per job(3TB). If a job
> hits DFS quota limitation, the task that hit it will fail and there will be a
> few task reties before the job actually fails. The retry is not very helpful
> because the job will always fail anyway. In some worse cases, we have a job
> which has a single reduce task writing more than 3TB to HDFS over 20 hours,
> the reduce task exceeds the quota limitation and retries 4 times until the
> job fails in the end thus consuming a lot of unnecessary resource. This
> ticket aims at providing the feature to let a job fail fast when it writes
> too much data to the DFS and exceeds the DFS quota limitation. The fast fail
> feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 .
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]