Johan Gustavsson created MAPREDUCE-7022:
-------------------------------------------
Summary: Fast fail rogue jobs based on task scratch dir size
Key: MAPREDUCE-7022
URL: https://issues.apache.org/jira/browse/MAPREDUCE-7022
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: task
Reporter: Johan Gustavsson
With the introduction of MAPREDUCE-6489 there are some options to kill rogue
tasks based on writes to local disk writes. In our environment are we mainly
run Hive based jobs we noticed that this counter and the size of the local
scratch dirs were very different. We had tasks where BYTES_WRITTEN counter were
at 300Gb and where it was at 10Tb both producing around 200Gb on local disk, so
it didn't help us much. So to extend this feature tasks should monitor local
scratchdir size and fail if they pass the limit. In these cases the tasks
should not be retried either but instead the job should fast fail.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]