Locking and exception issues in JobHistory Server.
--------------------------------------------------
Key: MAPREDUCE-3972
URL: https://issues.apache.org/jira/browse/MAPREDUCE-3972
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mrv2
Affects Versions: 0.23.2
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
The JobHistory server's locking is inconsistent and wrong in some cases. This
is not super critical because the issues would only show up if a job is being
cleaned up or moved from intermediate done to done, at the same time it is
being parsed into a CompletedJob. However the locking is slowing down the
server in some cases, and is a ticking time bomb that needs to be addressed.
As part of this too we need to be sure that the Cleaner and Intermediate to
Done migration threads handle exceptions properly. Now it appears that the
exception is logged, and the thread just shuts down. This means that the
history server could still be up and running for weeks and never remove old
jobs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira