[jira] Updated: (HADOOP-3557) Class JobControl needs to be rewritten for safe concurrency

Aaron Greenhouse (JIRA) Fri, 13 Jun 2008 09:07:14 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Aaron Greenhouse updated HADOOP-3557:
-------------------------------------

    Attachment: JobControl.patch

Patch file that applies the changes described.

> Class JobControl needs to be rewritten for safe concurrency
> -----------------------------------------------------------
>
>                 Key: HADOOP-3557
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3557
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.0
>            Reporter: Aaron Greenhouse
>         Attachments: JobControl.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The use of concurrency control and synchronization in JobControl is broken.  
> Consider
> - The locking in addToQueue() and toArrayList() is supposed to create 
> per-queue locking for the queues referenced by the fields readyJobs, 
> runningJobs, successfulJobs, waitingJobs, and failedJobs.  But this is 
> compromised by the fact that these methods are always called when the thread 
> is already synchronized on "this".
> - The run() method is written to use a busy-loop and Thread.sleep(int). But 
> runnerState is not generally accessed using synchronization, nor is it 
> volatile. This means that the run() is not guaranteed to ever view updates to 
> runnerState.  The run() method really needs to be changed to use wait(), and 
> to be signaled when changes are made to runnerState or to any of the queues.
> Let's also consider the existing synchronized methods:
> addJob() -- I'm pretty sure that the synchronization in addJob is only 
> necessary to protect the (indirect) access to the long field nextJobID. This 
> situation could be simplified by moving the synchronization to 
> getNextJobID(). As it stands, it makes the locking in addToQueue() redundant. 
> allFinished() -- This is synchronized to provide an atomic snapshot of the 
> state of the queues. It should be changed to use wait() and be notified 
> whenever the size of one of the queues changes. 
> checkRunningJobs(), checkWaitingJobs(), and startReadyJobs() -- These are all 
> synchronized to prevent them from interfering with each other, themselves, 
> and with addJobs(). This is too conservative. We really only need to be sure 
> that each one doesn't interfere with itself (otherwise we could end up 
> processing entries twice in parallel), but this is not possible based on the 
> intended usage: they are are only called from the run(), so they can never be 
> concurrency invoked. I don't see any danger from them running concurrently 
> with addJob(), as long as the integrity of the underlying queues is 
> protected. As with addJob(), the synchronization on these methods makes the 
> per-queue synchronization in addToQueue redundant. 
> This class should be rewritten as follows:
>     * The field groupName should be made final. 
>     * Add a new private boolean field stateChanged that is set to true 
> whenever the state of the object is changed from a public method call. This 
> is checked and reset by run(). 
>     * The per-queue synchronization needs to be scrapped: it isn't being 
> exploited any way. 
>     * The five queue fields plus the fields runnerState, nextJobID, and the 
> new stateChanged should all be protected by the lock this. 
>     * The run() method should be rewritten to use wait(). 
>     * public methods that change the state of the queues or runnerState 
> should set stateChanged to true and call notifyAll(). 
>     * Ideally, allFinished() should be changed to use wait() as well. This 
> would change the public interface of the class and break existing clients.  
> So this method should be @Deprecated, and a new waitForAllFinished() method 
> should be added.
>     * The queue fields should be assigned HashMap objects instead of 
> Hashtable objects. There is no advantage to using the synchronization 
> provided by Hashtable. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3557) Class JobControl needs to be rewritten for safe concurrency

Reply via email to