[jira] [Commented] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

Jason Lowe (JIRA) Tue, 09 Oct 2018 14:27:16 -0700


    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16644118#comment-16644118
 ]


Jason Lowe commented on MAPREDUCE-7148:
---------------------------------------

Thanks for the patch!  Seems like a reasonable request and approach.

Not sure about the StorageCapacityExceededException name.  It implies any kind 
of storage capacity error could fall underneath it, like full local disk or a 
local disk quota which _is_ something we would want to retry.  Not an issue for 
the current patch, more of a maintenance concern if other filesystems decide to 
start using it as part of their exception repertoire.  Depending upon the type 
of filesystem it could or could not be appropriate for this feature.  I'm not 
sure offhand what a better name would be, just pointing out the potential for 
confusion there.

There really should be unit tests for this given it's a new feature to verify 
the master enable works to disable it and the feature works when the master 
enable is on.


> Fast fail jobs when exceeds dfs quota limitation
> ------------------------------------------------
>
>                 Key: MAPREDUCE-7148
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>    Affects Versions: 2.7.0, 2.8.0, 2.9.0
>         Environment: hadoop 2.7.3
>            Reporter: Wang Yan
>            Assignee: Wang Yan
>            Priority: Major
>         Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch, 
> MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch
>
>
> We are running hive jobs with a DFS quota limitation per job(3TB). If a job 
> hits DFS quota limitation, the task that hit it will fail and there will be a 
> few task reties before the job actually fails. The retry is not very helpful 
> because the job will always fail anyway. In some worse cases, we have a job 
> which has a single reduce task writing more than 3TB to HDFS over 20 hours, 
> the reduce task exceeds the quota limitation and retries 4 times until the 
> job fails in the end thus consuming a lot of unnecessary resource. This 
> ticket aims at providing the feature to let a job fail fast when it writes 
> too much data to the DFS and exceeds the DFS quota limitation. The fast fail 
> feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

Reply via email to