[ https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643375#comment-16643375 ]
Wang Yan edited comment on MAPREDUCE-7148 at 10/9/18 12:56 PM: --------------------------------------------------------------- * I have added Apache license header to StorageCapacityExceededException.java. * As for the failure of TestRPC.testClientBackOff(due to timeout), I don't think it is caused by my patch since the test finishes successfully in my local. * Manually test ** I find it is not easy to write junit test for this feature, because the logic is in YarnChild's main function and I cannot mock anything here. Instead, I performed manually test to confirm the behaviour.(Note that this feature does not work in uber mode.) *** (1)confirmed the fast fail works with the setting of true **** mapreduce.job.dfs.storage.capacity.kill-limit-exceed=true **** select * from tb1 *** (2)confirmed the failed task will retry multiple times until job fail with the setting of false **** mapreduce.job.dfs.storage.capacity.kill-limit-exceed=false **** select * from tb1 *** (3)confirmed the failed task will retry multiple times until job fail with default value **** select * from tb1 was (Author: tiana528): * I have added Apache license header to StorageCapacityExceededException.java. * As for the failure of TestRPC.testClientBackOff(due to timeout), I don't think it is caused by my patch since the test finishes successfully in my local. * Manually test ** I find it is not easy to write junit test for this feature, because the logic is in YarnChild's main function and I cannot mock anything here. Instead, I performed manually test to confirm the behaviour.(Note that this feature does not work in uber mode.) *** (1)confirmed the fast fail works **** mapreduce.job.dfs.storage.capacity.kill-limit-exceed=true with the setting of true **** select * from tb1 *** (2)confirmed the failed task will retry multiple times until job fail with the setting of false **** mapreduce.job.dfs.storage.capacity.kill-limit-exceed=false **** select * from tb1 *** (3)confirmed the failed task will retry multiple times until job fail with default value **** select * from tb1 > Fast fail jobs when exceeds dfs quota limitation > ------------------------------------------------ > > Key: MAPREDUCE-7148 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task > Affects Versions: 2.7.0, 2.8.0, 2.9.0 > Environment: hadoop 2.7.3 > Reporter: Wang Yan > Assignee: Wang Yan > Priority: Major > Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch, > MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch > > > We are running hive jobs with a DFS quota limitation per job(3TB). If a job > hits DFS quota limitation, the task that hit it will fail and there will be a > few task reties before the job actually fails. The retry is not very helpful > because the job will always fail anyway. In some worse cases, we have a job > which has a single reduce task writing more than 3TB to HDFS over 20 hours, > the reduce task exceeds the quota limitation and retries 4 times until the > job fails in the end thus consuming a lot of unnecessary resource. This > ticket aims at providing the feature to let a job fail fast when it writes > too much data to the DFS and exceeds the DFS quota limitation. The fast fail > feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 . -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org