[jira] [Commented] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

Steve Loughran (JIRA) Tue, 30 Oct 2018 10:57:56 -0700


    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669127#comment-16669127
 ]


Steve Loughran commented on MAPREDUCE-7148:
-------------------------------------------

I think LimitedPrivate is unrealistic and dangerous.

* It misleads people updating code that they don't need to worry about breaking 
things
* yet for a long time stuff like UGI was marked this way (HADOOP-10776)
* it also encourages people to put in special back doors for other projects 
(i.e HDFS-7694) which then end up being used broadly despite the fact nobody 
wrote down what they were meant to do, did any real tests, etc. See also 
ProxyUsers, CreateFlag NO LOCAL WRITE.
* you pretty much have to assume that any YARN app will need to use stuff 
marked as for MR only, so stop pretending otherwise.
* and as a result you end up boxing yourself into bad designs which were done 
in a hurry because "it's only limited private", with semantics defined by 
whatever that quick implementation did.

The marker just lets us be lazy "hey, it's limited private" rather than 
rigorous "we have an API for people, what should it do"

That said, we've marked a few things recently LimitedPrivate("Management 
tools")+ Unstable, with a goal of meaning "this is for management tools, we may 
break it and they will have to adapt". It clearly has a role my code, it's just 
everywhere else that its wrong and indefensible :)

> Fast fail jobs when exceeds dfs quota limitation
> ------------------------------------------------
>
>                 Key: MAPREDUCE-7148
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>    Affects Versions: 2.7.0, 2.8.0, 2.9.0
>         Environment: hadoop 2.7.3
>            Reporter: Wang Yan
>            Assignee: Wang Yan
>            Priority: Major
>         Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch, 
> MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch, MAPREDUCE-7148.005.patch, 
> MAPREDUCE-7148.006.patch, MAPREDUCE-7148.007.patch, MAPREDUCE-7148.008.patch
>
>
> We are running hive jobs with a DFS quota limitation per job(3TB). If a job 
> hits DFS quota limitation, the task that hit it will fail and there will be a 
> few task reties before the job actually fails. The retry is not very helpful 
> because the job will always fail anyway. In some worse cases, we have a job 
> which has a single reduce task writing more than 3TB to HDFS over 20 hours, 
> the reduce task exceeds the quota limitation and retries 4 times until the 
> job fails in the end thus consuming a lot of unnecessary resource. This 
> ticket aims at providing the feature to let a job fail fast when it writes 
> too much data to the DFS and exceeds the DFS quota limitation. The fast fail 
> feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

Reply via email to