[
https://issues.apache.org/jira/browse/MAPREDUCE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Joseph Evans updated MAPREDUCE-3972:
-------------------------------------------
Attachment: MR-3972.txt
This patch depends on the patch in MR-4059. I did this because both were
touching the same set of code, in fairly major ways. I will not put this into
patch available until MR-4059 is in. I have tested the code. It works on my
single node cluster. It passes all unit tests, and there are no find-bugs
issues with it.
> Locking and exception issues in JobHistory Server.
> --------------------------------------------------
>
> Key: MAPREDUCE-3972
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3972
> Project: Hadoop Map/Reduce
> Issue Type: Sub-task
> Components: mrv2
> Affects Versions: 0.23.2
> Reporter: Robert Joseph Evans
> Assignee: Robert Joseph Evans
> Attachments: MR-3972.txt
>
>
> The JobHistory server's locking is inconsistent and wrong in some cases.
> This is not super critical because the issues would only show up if a job is
> being cleaned up or moved from intermediate done to done, at the same time it
> is being parsed into a CompletedJob. However the locking is slowing down the
> server in some cases, and is a ticking time bomb that needs to be addressed.
> As part of this too we need to be sure that the Cleaner and Intermediate to
> Done migration threads handle exceptions properly. Now it appears that the
> exception is logged, and the thread just shuts down. This means that the
> history server could still be up and running for weeks and never remove old
> jobs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira