[jira] [Comment Edited] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

Wang Yan (JIRA) Sat, 13 Oct 2018 03:06:48 -0700


    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16648862#comment-16648862
 ]


Wang Yan edited comment on MAPREDUCE-7148 at 10/13/18 10:05 AM:
----------------------------------------------------------------

[[email protected]]

Thanks for your suggestion. I have updated the name of the exception in the 
latest patch.

However, I find it is not easy to write a junit test for this fix, as I 
explained here: 
https://issues.apache.org/jira/browse/MAPREDUCE-7148?focusedCommentId=16643375&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16643375

Could you please give me some suggestion on how to write the test for it or can 
we go without the junit test since we have manually tested it?

It is not easy to test a main function, because we cannot mock anything. 
Actually, there is no existing unit test or integration test for testing the 
YarnChild.java's main function.

 


was (Author: tiana528):
[[email protected]]

Thanks for your suggestion. I have updated the name of the exception in the 
latest patch.

However, I find it is not easy to write a junit test for this fix, as I 
explained here: 
https://issues.apache.org/jira/browse/MAPREDUCE-7148?focusedCommentId=16643375&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16643375

Could you please give me some suggestion on how to write the test for it or can 
we go without the junit test since we have manually tested it?

It is not easy to test a main function, since we cannot mock anything. 
Actually, there is no code testing the YarnChild.java's main function, even 
though there are a lot of logic there.

 

 

> Fast fail jobs when exceeds dfs quota limitation
> ------------------------------------------------
>
>                 Key: MAPREDUCE-7148
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>    Affects Versions: 2.7.0, 2.8.0, 2.9.0
>         Environment: hadoop 2.7.3
>            Reporter: Wang Yan
>            Assignee: Wang Yan
>            Priority: Major
>         Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch, 
> MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch, MAPREDUCE-7148.005.patch
>
>
> We are running hive jobs with a DFS quota limitation per job(3TB). If a job 
> hits DFS quota limitation, the task that hit it will fail and there will be a 
> few task reties before the job actually fails. The retry is not very helpful 
> because the job will always fail anyway. In some worse cases, we have a job 
> which has a single reduce task writing more than 3TB to HDFS over 20 hours, 
> the reduce task exceeds the quota limitation and retries 4 times until the 
> job fails in the end thus consuming a lot of unnecessary resource. This 
> ticket aims at providing the feature to let a job fail fast when it writes 
> too much data to the DFS and exceeds the DFS quota limitation. The fast fail 
> feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

Reply via email to