[
https://issues.apache.org/jira/browse/MAPREDUCE-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Maysam Yabandeh updated MAPREDUCE-6489:
---------------------------------------
Attachment: MAPREDUCE-6489.003.patch
Thanks [~jlowe] for detailed comments. The updated patch includes the following
changes:
* Interpreting negative limits as no limits
* Updating the description of config var to explain that it only covers writes
that affect BYTES_WRITTEN
* Using the next unused exit code, i.e., 69
* Using FATAL level for the log message. Also calling umbilical.fatalError.
* Changing the conf var name to mapreduce.task.local-fs.write-limit.bytes.
bytes is separated with a dot to be consistent with the existing var names in
which the unit is specified at the end separated with a dot, e.g., ".bytes",
".kb", ".mb"
* Updating the test to actually write to local file system and test the
specified limits.
Let me add that I am not entirely sure how the combination of
umbilical.fatalError and SystemUtil.exit works out when the event handler used
by umbilical is async. In this case the diagnosis update event would be queued,
and might never actually be handled if SystemUtil.exit is invoked sooner.
> Fail fast rogue tasks that write too much to local disk
> -------------------------------------------------------
>
> Key: MAPREDUCE-6489
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6489
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: task
> Affects Versions: 2.7.1
> Reporter: Maysam Yabandeh
> Assignee: Maysam Yabandeh
> Attachments: MAPREDUCE-6489.001.patch, MAPREDUCE-6489.002.patch,
> MAPREDUCE-6489.003.patch
>
>
> Tasks of the rogue jobs can write too much to local disk, negatively
> affecting the jobs running in collocated containers. Ideally YARN will be
> able to limit amount of local disk used by each task: YARN-4011. Until then,
> the mapreduce task can fail fast if the task is writing too much (above a
> configured threshold) to local disk.
> As we discussed
> [here|https://issues.apache.org/jira/browse/YARN-4011?focusedCommentId=14902750&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14902750]
> the suggested approach is that the MapReduce task checks for BYTES_WRITTEN
> counter for the local disk and throws an exception when it goes beyond a
> configured value. It is true that written bytes is larger than the actual
> used disk space, but to detect a rogue task the exact value is not required
> and a very large value for written bytes to local disk is a good indicative
> that the task is misbehaving.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)