[
https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668838#comment-16668838
]
Jason Lowe commented on MAPREDUCE-7148:
---------------------------------------
Thanks for updating the patch! Looks good overall, just some cleanup nits.
I'm personally not a fan of LimitedPrivate. I think it has limited (ha!)
utility in practice. For example, what is Tez supposed to do if they want to
implement the same feature? Are they not allowed to do so until LimitedPrivate
is removed from the class? If so then we need to file a followup JIRA to
remember to revisit this annotation. I wonder if marking it Unstable is more
useful than LimitedPrivate in practice as a "buyer beware" for those that want
to use it downstream and are willing to risk a future incompatibility on a
later version of Hadoop to use it. Not a must-change, but I'm curious what
[[email protected]] thinks about it especially in the very likely scenario
that Tez wants to replicate this feature.
reportError should not lookup the conf value until it's necessary, i.e.: the
exception is known to be the relevant type.
The difference between WARN and ERROR for the log level is subtle and arguably
an odd choice to use WARN. This error is fatal to the task, i.e.: the entity
emitting the log. IMHO logging at the error level is the bare minimum level
since this exception is terminating the task attempt, the entity emitting the
log message. I'd much rather see the log mentioning that it is requesting the
job to be terminated rather than expecting users to notice it's a WARN vs. an
ERROR to know the difference.
Nit: A lot of the tests are copy-n-paste. A private method that takes the
exception to throw and what to expect for the fail job flag would help.
> Fast fail jobs when exceeds dfs quota limitation
> ------------------------------------------------
>
> Key: MAPREDUCE-7148
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: task
> Affects Versions: 2.7.0, 2.8.0, 2.9.0
> Environment: hadoop 2.7.3
> Reporter: Wang Yan
> Assignee: Jason Lowe
> Priority: Major
> Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch,
> MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch, MAPREDUCE-7148.005.patch,
> MAPREDUCE-7148.006.patch, MAPREDUCE-7148.007.patch, MAPREDUCE-7148.008.patch
>
>
> We are running hive jobs with a DFS quota limitation per job(3TB). If a job
> hits DFS quota limitation, the task that hit it will fail and there will be a
> few task reties before the job actually fails. The retry is not very helpful
> because the job will always fail anyway. In some worse cases, we have a job
> which has a single reduce task writing more than 3TB to HDFS over 20 hours,
> the reduce task exceeds the quota limitation and retries 4 times until the
> job fails in the end thus consuming a lot of unnecessary resource. This
> ticket aims at providing the feature to let a job fail fast when it writes
> too much data to the DFS and exceeds the DFS quota limitation. The fast fail
> feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 .
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]