[jira] [Comment Edited] (TEZ-4110) Make Tez fail fast when DFS quota is exceeded

Wang Yan (Jira) Mon, 23 Dec 2019 00:18:04 -0800


    [ 
https://issues.apache.org/jira/browse/TEZ-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002094#comment-17002094
 ]


Wang Yan edited comment on TEZ-4110 at 12/23/19 8:16 AM:
---------------------------------------------------------

I created a patch for this issue, I have tested it internally and it is working 
properly. But this patch is blocked by the following two, or compile cannot 
pass.

(1) hadoop release of 3.3.0, waiting for MAPREDUCE-7148 to be released.

(2) version up hadoop dependency in tez to 3.3.0.


was (Author: tiana528):
I created a patch for this issue, I have tested it internally and it is working 
properly. But this patch is blocked by the following two, or compile cannot 
pass.

(1) hadoop release of 3.3.0, waiting for 
https://issues.apache.org/jira/browse/MAPREDUCE-7148 to be released.

(2) version up hadoop dependency in tez to 3.3.0.

> Make Tez fail fast when DFS quota is exceeded
> ---------------------------------------------
>
>                 Key: TEZ-4110
>                 URL: https://issues.apache.org/jira/browse/TEZ-4110
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.9.0, 0.8.4, 0.9.2
>         Environment: hadoop 2.9, hive 2.3, tez
>  
>            Reporter: Wang Yan
>            Priority: Minor
>
> This ticket aims at creating a similar feature as MAPREDUCE-7148 in tez.
> Make a tez job fail fast when dfs quota limitation is reached.
> The background is : We are running hive jobs with a DFS quota limitation per 
> job(3TB). If a job hits DFS quota limitation, the task that hit it will fail 
> and there will be a few task reties before the job actually fails. The retry 
> is not very helpful because the job will always fail anyway. In some worse 
> cases, we have a job which has a single reduce task writing more than 3TB to 
> HDFS over 20 hours, the reduce task exceeds the quota limitation and retries 
> 4 times until the job fails in the end thus consuming a lot of unnecessary 
> resource. This ticket aims at providing the feature to let a job fail fast 
> when it writes too much data to the DFS and exceeds the DFS quota limitation.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (TEZ-4110) Make Tez fail fast when DFS quota is exceeded

Reply via email to