[jira] [Commented] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

Steve Loughran (JIRA) Fri, 19 Oct 2018 03:05:22 -0700


    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656566#comment-16656566
 ]


Steve Loughran commented on MAPREDUCE-7148:
-------------------------------------------

As I've warned, I can't do the final vote as someone who knows MR needs to be 
the one there.

Regarding the bits I do know
* that new exception should be tagged @Evolving, as tagging it as @Stable boxes 
us out of making changes. Maybe even @LimitedPrivate(HDFS, MapReduce) for now 
with an explanation of what it does (e.g point at this JIRA and say "raised by 
HDFS to say what it does"
* I like what you've done with the tests. There's now a fair bit of redundant 
setup for each one. If the task & umbilical were made fields, the test 
{{setup()}} method could do the basic init here, so cut four identical lines 
from each test:

{code}
108         Task task = Mockito.mock(Task.class);
109         TaskUmbilicalProtocol umbilical = 
Mockito.mock(TaskUmbilicalProtocol.class);
110         Configuration conf = new Configuration();
111         when(task.getConf()).thenReturn(conf);
{code}

> Fast fail jobs when exceeds dfs quota limitation
> ------------------------------------------------
>
>                 Key: MAPREDUCE-7148
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>    Affects Versions: 2.7.0, 2.8.0, 2.9.0
>         Environment: hadoop 2.7.3
>            Reporter: Wang Yan
>            Assignee: Wang Yan
>            Priority: Major
>         Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch, 
> MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch, MAPREDUCE-7148.005.patch, 
> MAPREDUCE-7148.006.patch, MAPREDUCE-7148.007.patch
>
>
> We are running hive jobs with a DFS quota limitation per job(3TB). If a job 
> hits DFS quota limitation, the task that hit it will fail and there will be a 
> few task reties before the job actually fails. The retry is not very helpful 
> because the job will always fail anyway. In some worse cases, we have a job 
> which has a single reduce task writing more than 3TB to HDFS over 20 hours, 
> the reduce task exceeds the quota limitation and retries 4 times until the 
> job fails in the end thus consuming a lot of unnecessary resource. This 
> ticket aims at providing the feature to let a job fail fast when it writes 
> too much data to the DFS and exceeds the DFS quota limitation. The fast fail 
> feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

Reply via email to