GitHub user navsan opened a pull request:

    https://github.com/apache/incubator-quickstep/pull/33

    BugFix: Update NumQueuedWorkOrders to fix scheduling

    Quickstep scheduling is currently broken (since PR #14). The foreman only 
schedules work for one worker, leaving all other workers idle. This PR fixes 
that bug. 
    
    The foreman maintains the number of queued work orders for each worker in 
the WorkerDirectory. This state was not being incremented when WorkOrders are 
dispatched, and not being decremented when WorkOrders are completed. The search 
for LeastLoadedWorker would therefore always pick the first worker, resulting 
in serial execution of the entire query. 
    
    In this PR, I have incremented the number of queued work orders in the 
Foreman when it dispatches messages. 
    
    When a WorkOrderCompletion message arrives, this number must be 
decremented. However, the worker's thread ID is not available to the Foreman, 
since PR #14 moved the message deserialization into the PolicyEnforcer. So, in 
this PR, I've added a pointer to the WorkerDirectory in the PolicyEnforcer. The 
PolicyEnforcer decrements the number of queued work orders while processing 
WorkOrderCompletion messages. 
    
    The WorkerDirectory is not thread-safe, so the decrement should only be 
done in the Foreman thread. Since the PolicyEnforcer is part of the Foreman 
thread, this change should be fine. 
    
    Tested on CloudLab machine  (40 workers) with a few example queries. 
    
    [A big thanks to @rogersjeffreyl for helping me debug this!]

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-quickstep fix_scheduler

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-quickstep/pull/33.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #33
    
----
commit 74e49fa7f91ab33e20d488ef9923b285214bc04e
Author: Navneet Potti <nav...@apache.org>
Date:   2016-06-15T02:52:25Z

    BugFix: Update NumQueuedWorkOrders to fix scheduling

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to