[jira] [Created] (MAPREDUCE-7022) Fast fail rogue jobs based on task scratch dir size

Johan Gustavsson (JIRA) Thu, 07 Dec 2017 18:24:06 -0800

Johan Gustavsson created MAPREDUCE-7022:
-------------------------------------------


             Summary: Fast fail rogue jobs based on task scratch dir size
                 Key: MAPREDUCE-7022
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7022
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: task
            Reporter: Johan Gustavsson


With the introduction of MAPREDUCE-6489 there are some options to kill rogue 
tasks based on writes to local disk writes. In our environment are we mainly 
run Hive based jobs we noticed that this counter and the size of the local 
scratch dirs were very different. We had tasks where BYTES_WRITTEN counter were 
at 300Gb and where it was at 10Tb both producing around 200Gb on local disk, so 
it didn't help us much. So to extend this feature tasks should monitor local 
scratchdir size and fail if they pass the limit. In these cases the tasks 
should not be retried either but instead the job should fast fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (MAPREDUCE-7022) Fast fail rogue jobs based on task scratch dir size

Reply via email to