Merge pull request #159 from liancheng/dagscheduler-actor-refine

Migrate the daemon thread started by DAGScheduler to Akka actor

`DAGScheduler` adopts an event queue and a daemon thread polling the it to 
process events sent to a `DAGScheduler`.  This is a classical actor use case.  
By migrating this thread to Akka actor, we may benefit from both cleaner code 
and better performance (context switching cost of Akka actor is much less than 
that of a native thread).

But things become a little complicated when taking existing test code into 
consideration.

Code in `DAGSchedulerSuite` is somewhat tightly coupled with `DAGScheduler`, 
and directly calls `DAGScheduler.processEvent` instead of posting event 
messages to `DAGScheduler`.  To minimize code change, I chose to let the actor 
to delegate messages to `processEvent`.  Maybe this doesn't follow conventional 
actor usage, but I tried to make it apparently correct.

Another tricky part is that, since `DAGScheduler` depends on the `ActorSystem` 
provided by its field `env`, `env` cannot be null.  But the `dagScheduler` 
field created in `DAGSchedulerSuite.before` was given a null `env`.  What's 
more, `BlockManager.blockIdsToBlockManagers` checks whether `env` is null to 
determine whether to run the production code or the test code (bad smell here, 
huh?).  I went through all callers of `BlockManager.blockIdsToBlockManagers`, 
and made sure that if `env != null` holds, then `blockManagerMaster == null` 
must also hold.  That's the logic behind `BlockManager.scala` [line 
896](https://github.com/liancheng/incubator-spark/compare/dagscheduler-actor-refine?expand=1#diff-2b643ea78c1add0381754b1f47eec132L896).

At last, since `DAGScheduler` instances are always `start()`ed after creation, 
I removed the `start()` method, and starts the `eventProcessActor` within the 
constructor.


Project: http://git-wip-us.apache.org/repos/asf/incubator-spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spark/commit/2054c61a
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spark/tree/2054c61a
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spark/diff/2054c61a

Branch: refs/heads/master
Commit: 2054c61a18c277c00661b89bbae365470c297031
Parents: 9290e5b e2a43b3
Author: Matei Zaharia <[email protected]>
Authored: Wed Nov 13 16:49:55 2013 -0800
Committer: Matei Zaharia <[email protected]>
Committed: Wed Nov 13 16:49:55 2013 -0800

----------------------------------------------------------------------
 .../scala/org/apache/spark/SparkContext.scala   |   1 -
 .../apache/spark/scheduler/DAGScheduler.scala   | 104 +++++++------------
 .../org/apache/spark/storage/BlockManager.scala |   4 +-
 .../spark/scheduler/DAGSchedulerSuite.scala     |   2 +-
 4 files changed, 43 insertions(+), 68 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/2054c61a/core/src/main/scala/org/apache/spark/SparkContext.scala
----------------------------------------------------------------------

Reply via email to