[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated MAPREDUCE-6885:
-------------------------------------
    Description: 
If eventHandlingThread handles an event which causes it to throw an exception 
(e.g. if it is unable to flush an event to HDFS), the thread dies. The events 
are enqueued and eventually handled when JobHistoryEventHandler stops. If 
handling these events also throws an exception, the remaining events are lost. 
This can for example cause moving job history files to 
mapreduce.jobhistory.done-dir to not occur.

There should be some fail-proof logic here to prevent these events from being 
lost. Should also be careful that the same exception is not thrown for each 
event to prevent the logs from being cluttered with the same stacktrace. 
Perhaps we can set a configurable number of failed handleEvent calls before 
finally giving up a clean shutdown.

  was:
If eventHandlingThread handles an event which causes it to throw an exception 
(e.g. if it is unable to flush an event to HDFS), the thread dies. This thread 
is responsible for moving job history files to mapreduce.jobhistory.done-dir, 
if an exception is thrown the files will not be moved here, which is bad.

We should catch these exceptions so that the thread can still move these files 
when the job is complete.


> JobHistory event handling does not complete if handling event throws 
> exception on shutdown
> ------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6885
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6885
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Jonathan Hung
>
> If eventHandlingThread handles an event which causes it to throw an exception 
> (e.g. if it is unable to flush an event to HDFS), the thread dies. The events 
> are enqueued and eventually handled when JobHistoryEventHandler stops. If 
> handling these events also throws an exception, the remaining events are 
> lost. This can for example cause moving job history files to 
> mapreduce.jobhistory.done-dir to not occur.
> There should be some fail-proof logic here to prevent these events from being 
> lost. Should also be careful that the same exception is not thrown for each 
> event to prevent the logs from being cluttered with the same stacktrace. 
> Perhaps we can set a configurable number of failed handleEvent calls before 
> finally giving up a clean shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to