[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-3972:
-------------------------------------------

    Attachment: MR-3972.txt

This patch should address all of the review comments so far.

The JobListCache is now a ConcurrentSkipListMap with no locking around it.  To 
do this I declared it safe that we may delete a few more jobs from the cache 
then expected.

The HistoryStorage class is no longer informed about items being removed from 
HDFS, and the CachedHistoryStorage tries to preemptively know that they were 
removed, but it is not that critical if it does not happen.

Please take a look as now that there is no locking around the JobListCache 
there needed to be some extra checks to avoid NPEs 
                
> Locking and exception issues in JobHistory Server.
> --------------------------------------------------
>
>                 Key: MAPREDUCE-3972
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3972
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>         Attachments: MR-3972.txt, MR-3972.txt, MR-3972.txt, MR-3972.txt, 
> MR-3972.txt
>
>
> The JobHistory server's locking is inconsistent and wrong in some cases.  
> This is not super critical because the issues would only show up if a job is 
> being cleaned up or moved from intermediate done to done, at the same time it 
> is being parsed into a CompletedJob.  However the locking is slowing down the 
> server in some cases, and is a ticking time bomb that needs to be addressed.
> As part of this too we need to be sure that the Cleaner and Intermediate to 
> Done migration threads handle exceptions properly.  Now it appears that the 
> exception is logged, and the thread just shuts down.  This means that the 
> history server could still be up and running for weeks and never remove old 
> jobs.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to