[jira] [Comment Edited] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

Wang Yan (JIRA) Tue, 09 Oct 2018 05:57:26 -0700


    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643375#comment-16643375
 ]


Wang Yan edited comment on MAPREDUCE-7148 at 10/9/18 12:56 PM:
---------------------------------------------------------------

* I have added Apache license header  to StorageCapacityExceededException.java.
 * As for the failure of TestRPC.testClientBackOff(due to timeout), I don't 
think it is caused by my patch since the test finishes successfully in my local.
 * Manually test
 ** I find it is not easy to write junit test for this feature, because the 
logic is in YarnChild's main function and I cannot mock anything here. Instead, 
I performed manually test to confirm the behaviour.(Note that this feature does 
not work in uber mode.)
 *** (1)confirmed the fast fail works  with the setting of true
 **** mapreduce.job.dfs.storage.capacity.kill-limit-exceed=true
 **** select * from tb1
 *** (2)confirmed the failed task will retry multiple times until job fail with 
the setting of false
 **** mapreduce.job.dfs.storage.capacity.kill-limit-exceed=false
 **** select * from tb1
 *** (3)confirmed the failed task will retry multiple times until job fail with 
default value
 **** select * from tb1


was (Author: tiana528):
* I have added Apache license header  to StorageCapacityExceededException.java.
 * As for the failure of TestRPC.testClientBackOff(due to timeout), I don't 
think it is caused by my patch since the test finishes successfully in my local.
 * Manually test
 ** I find it is not easy to write junit test for this feature, because the 
logic is in YarnChild's main function and I cannot mock anything here. Instead, 
I performed manually test to confirm the behaviour.(Note that this feature does 
not work in uber mode.)
 *** (1)confirmed the fast fail works
 **** mapreduce.job.dfs.storage.capacity.kill-limit-exceed=true with the 
setting of true
 **** select * from tb1
 *** (2)confirmed the failed task will retry multiple times until job fail with 
the setting of false
 **** mapreduce.job.dfs.storage.capacity.kill-limit-exceed=false
 **** select * from tb1
 *** (3)confirmed the failed task will retry multiple times until job fail with 
default value
 **** select * from tb1

> Fast fail jobs when exceeds dfs quota limitation
> ------------------------------------------------
>
>                 Key: MAPREDUCE-7148
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>    Affects Versions: 2.7.0, 2.8.0, 2.9.0
>         Environment: hadoop 2.7.3
>            Reporter: Wang Yan
>            Assignee: Wang Yan
>            Priority: Major
>         Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch, 
> MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch
>
>
> We are running hive jobs with a DFS quota limitation per job(3TB). If a job 
> hits DFS quota limitation, the task that hit it will fail and there will be a 
> few task reties before the job actually fails. The retry is not very helpful 
> because the job will always fail anyway. In some worse cases, we have a job 
> which has a single reduce task writing more than 3TB to HDFS over 20 hours, 
> the reduce task exceeds the quota limitation and retries 4 times until the 
> job fails in the end thus consuming a lot of unnecessary resource. This 
> ticket aims at providing the feature to let a job fail fast when it writes 
> too much data to the DFS and exceeds the DFS quota limitation. The fast fail 
> feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

Reply via email to