[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643375#comment-16643375
 ] 

Wang Yan edited comment on MAPREDUCE-7148 at 10/9/18 1:00 PM:
--------------------------------------------------------------

* I have added Apache license header  to StorageCapacityExceededException.java.
 * As for the failure of TestRPC.testClientBackOff(due to timeout), I don't 
think it is caused by my patch since the test finishes successfully in my local.
 * Manually test
 ** I find it is not easy to write junit test for this feature, because the 
logic is in YarnChild's main function and I cannot mock anything here. Instead, 
I performed manually test to confirm the behaviour.
 ** Test environment: I run hive queries with a dfs limitation per job in 
non-uber mode(this feature does not work in uber mode). The limitation is small 
and the size of table tb1 is relatively large so the job will exceed the dfs 
quota limitation.
 *** (1)confirmed the fast fail works with the setting of true
 **** mapreduce.job.dfs.storage.capacity.kill-limit-exceed=true
 **** select * from tb1
 *** (2)confirmed won't fast fail with the setting of false
 **** mapreduce.job.dfs.storage.capacity.kill-limit-exceed=false
 **** select * from tb1
 *** (3)confirmed won't fast fail with default value
 **** select * from tb1


was (Author: tiana528):
* I have added Apache license header  to StorageCapacityExceededException.java.
 * As for the failure of TestRPC.testClientBackOff(due to timeout), I don't 
think it is caused by my patch since the test finishes successfully in my local.
 * Manually test
 ** I find it is not easy to write junit test for this feature, because the 
logic is in YarnChild's main function and I cannot mock anything here. Instead, 
I performed manually test to confirm the behaviour.(Note that this feature does 
not work in uber mode.)
 *** (1)confirmed the fast fail works  with the setting of true
 **** mapreduce.job.dfs.storage.capacity.kill-limit-exceed=true
 **** select * from tb1
 *** (2)confirmed the failed task will retry multiple times until job fail with 
the setting of false
 **** mapreduce.job.dfs.storage.capacity.kill-limit-exceed=false
 **** select * from tb1
 *** (3)confirmed the failed task will retry multiple times until job fail with 
default value
 **** select * from tb1

> Fast fail jobs when exceeds dfs quota limitation
> ------------------------------------------------
>
>                 Key: MAPREDUCE-7148
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>    Affects Versions: 2.7.0, 2.8.0, 2.9.0
>         Environment: hadoop 2.7.3
>            Reporter: Wang Yan
>            Assignee: Wang Yan
>            Priority: Major
>         Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch, 
> MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch
>
>
> We are running hive jobs with a DFS quota limitation per job(3TB). If a job 
> hits DFS quota limitation, the task that hit it will fail and there will be a 
> few task reties before the job actually fails. The retry is not very helpful 
> because the job will always fail anyway. In some worse cases, we have a job 
> which has a single reduce task writing more than 3TB to HDFS over 20 hours, 
> the reduce task exceeds the quota limitation and retries 4 times until the 
> job fails in the end thus consuming a lot of unnecessary resource. This 
> ticket aims at providing the feature to let a job fail fast when it writes 
> too much data to the DFS and exceeds the DFS quota limitation. The fast fail 
> feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to