Merge pull request #29 from rxin/kill

Job killing

Moving https://github.com/mesos/spark/pull/935 here

The high level idea is to have an "interrupted" field in TaskContext, and a 
task should check that flag to determine if its execution should continue. For 
convenience, I provide an InterruptibleIterator which wraps around a normal 
iterator but checks for the interrupted flag. I also provide an 
InterruptibleRDD that wraps around an existing RDD.

As part of this pull request, I added an AsyncRDDActions class that provides a 
number of RDD actions that return a FutureJob (extending 
scala.concurrent.Future). The FutureJob can be used to kill the job execution, 
or waits until the job finishes.

This is NOT ready for merging yet. Remaining TODOs:

1. Add unit tests
2. Add job killing functionality for local scheduler (current job killing 
functionality only works in cluster scheduler)

-------------

Update on Oct 10, 2013:

This is ready!

Related future work:
- Figure out how to handle the job triggered by RangePartitioner (this one is 
tough; might become future work)
- Java API
- Python API


Project: http://git-wip-us.apache.org/repos/asf/incubator-spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spark/commit/e33b1839
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spark/tree/e33b1839
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spark/diff/e33b1839

Branch: refs/heads/master
Commit: e33b1839e27249e232a2126cec67a38109e03243
Parents: 3b11f43 9cd8786
Author: Patrick Wendell <pwend...@gmail.com>
Authored: Mon Oct 14 22:25:47 2013 -0700
Committer: Patrick Wendell <pwend...@gmail.com>
Committed: Mon Oct 14 22:25:47 2013 -0700

----------------------------------------------------------------------
 .../apache/spark/BlockStoreShuffleFetcher.scala |  12 +-
 .../scala/org/apache/spark/CacheManager.scala   |   4 +-
 .../scala/org/apache/spark/FutureAction.scala   | 250 +++++++++++++++++++
 .../apache/spark/InterruptibleIterator.scala    |  30 +++
 .../scala/org/apache/spark/ShuffleFetcher.scala |   5 +-
 .../scala/org/apache/spark/SparkContext.scala   |  37 ++-
 .../scala/org/apache/spark/TaskContext.scala    |  23 +-
 .../scala/org/apache/spark/TaskEndReason.scala  |   2 +
 .../org/apache/spark/executor/Executor.scala    | 118 +++++++--
 .../spark/executor/MesosExecutorBackend.scala   |  18 +-
 .../executor/StandaloneExecutorBackend.scala    |  10 +-
 .../org/apache/spark/rdd/AsyncRDDActions.scala  | 122 +++++++++
 .../org/apache/spark/rdd/CheckpointRDD.scala    |   2 +-
 .../org/apache/spark/rdd/CoGroupedRDD.scala     |   6 +-
 .../scala/org/apache/spark/rdd/HadoopRDD.scala  |  64 ++---
 .../spark/rdd/MapPartitionsWithContextRDD.scala |  41 +++
 .../spark/rdd/MapPartitionsWithIndexRDD.scala   |  41 ---
 .../org/apache/spark/rdd/NewHadoopRDD.scala     |  79 +++---
 .../org/apache/spark/rdd/PairRDDFunctions.scala |  16 +-
 .../spark/rdd/ParallelCollectionRDD.scala       |   5 +-
 .../main/scala/org/apache/spark/rdd/RDD.scala   |  95 ++++---
 .../org/apache/spark/rdd/ShuffledRDD.scala      |   2 +-
 .../org/apache/spark/rdd/SubtractedRDD.scala    |   2 +-
 .../apache/spark/scheduler/DAGScheduler.scala   | 118 +++++----
 .../spark/scheduler/DAGSchedulerEvent.scala     |  24 +-
 .../spark/scheduler/DAGSchedulerSource.scala    |   2 +-
 .../org/apache/spark/scheduler/JobWaiter.scala  |  62 +++--
 .../scala/org/apache/spark/scheduler/Pool.scala |   5 +-
 .../org/apache/spark/scheduler/ResultTask.scala |  44 ++--
 .../spark/scheduler/SchedulableBuilder.scala    |   3 +
 .../apache/spark/scheduler/ShuffleMapTask.scala |  46 ++--
 .../apache/spark/scheduler/SparkListener.scala  |   2 +-
 .../scala/org/apache/spark/scheduler/Task.scala |  63 ++++-
 .../apache/spark/scheduler/TaskScheduler.scala  |   3 +
 .../org/apache/spark/scheduler/TaskSet.scala    |   4 +
 .../scheduler/cluster/ClusterScheduler.scala    |  37 ++-
 .../cluster/ClusterTaskSetManager.scala         |  81 +++---
 .../scheduler/cluster/SchedulerBackend.scala    |   6 +-
 .../cluster/StandaloneClusterMessage.scala      |   2 +
 .../cluster/StandaloneSchedulerBackend.scala    |   7 +
 .../spark/scheduler/local/LocalScheduler.scala  | 190 +++++---------
 .../scheduler/local/LocalTaskSetManager.scala   |  13 +-
 .../org/apache/spark/CacheManagerSuite.scala    |   9 +-
 .../org/apache/spark/CheckpointSuite.scala      |   4 +-
 .../scala/org/apache/spark/JavaAPISuite.java    |   2 +-
 .../org/apache/spark/JobCancellationSuite.scala | 177 +++++++++++++
 .../apache/spark/rdd/AsyncRDDActionsSuite.scala | 176 +++++++++++++
 .../spark/rdd/PairRDDFunctionsSuite.scala       |   2 +-
 .../spark/scheduler/DAGSchedulerSuite.scala     |  14 +-
 .../spark/scheduler/cluster/FakeTask.scala      |   5 +-
 .../scheduler/local/LocalSchedulerSuite.scala   |  28 ++-
 51 files changed, 1563 insertions(+), 550 deletions(-)
----------------------------------------------------------------------


Reply via email to