Merge pull request #29 from rxin/kill Job killing
Moving https://github.com/mesos/spark/pull/935 here The high level idea is to have an "interrupted" field in TaskContext, and a task should check that flag to determine if its execution should continue. For convenience, I provide an InterruptibleIterator which wraps around a normal iterator but checks for the interrupted flag. I also provide an InterruptibleRDD that wraps around an existing RDD. As part of this pull request, I added an AsyncRDDActions class that provides a number of RDD actions that return a FutureJob (extending scala.concurrent.Future). The FutureJob can be used to kill the job execution, or waits until the job finishes. This is NOT ready for merging yet. Remaining TODOs: 1. Add unit tests 2. Add job killing functionality for local scheduler (current job killing functionality only works in cluster scheduler) ------------- Update on Oct 10, 2013: This is ready! Related future work: - Figure out how to handle the job triggered by RangePartitioner (this one is tough; might become future work) - Java API - Python API Project: http://git-wip-us.apache.org/repos/asf/incubator-spark/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-spark/commit/e33b1839 Tree: http://git-wip-us.apache.org/repos/asf/incubator-spark/tree/e33b1839 Diff: http://git-wip-us.apache.org/repos/asf/incubator-spark/diff/e33b1839 Branch: refs/heads/master Commit: e33b1839e27249e232a2126cec67a38109e03243 Parents: 3b11f43 9cd8786 Author: Patrick Wendell <pwend...@gmail.com> Authored: Mon Oct 14 22:25:47 2013 -0700 Committer: Patrick Wendell <pwend...@gmail.com> Committed: Mon Oct 14 22:25:47 2013 -0700 ---------------------------------------------------------------------- .../apache/spark/BlockStoreShuffleFetcher.scala | 12 +- .../scala/org/apache/spark/CacheManager.scala | 4 +- .../scala/org/apache/spark/FutureAction.scala | 250 +++++++++++++++++++ .../apache/spark/InterruptibleIterator.scala | 30 +++ .../scala/org/apache/spark/ShuffleFetcher.scala | 5 +- .../scala/org/apache/spark/SparkContext.scala | 37 ++- .../scala/org/apache/spark/TaskContext.scala | 23 +- .../scala/org/apache/spark/TaskEndReason.scala | 2 + .../org/apache/spark/executor/Executor.scala | 118 +++++++-- .../spark/executor/MesosExecutorBackend.scala | 18 +- .../executor/StandaloneExecutorBackend.scala | 10 +- .../org/apache/spark/rdd/AsyncRDDActions.scala | 122 +++++++++ .../org/apache/spark/rdd/CheckpointRDD.scala | 2 +- .../org/apache/spark/rdd/CoGroupedRDD.scala | 6 +- .../scala/org/apache/spark/rdd/HadoopRDD.scala | 64 ++--- .../spark/rdd/MapPartitionsWithContextRDD.scala | 41 +++ .../spark/rdd/MapPartitionsWithIndexRDD.scala | 41 --- .../org/apache/spark/rdd/NewHadoopRDD.scala | 79 +++--- .../org/apache/spark/rdd/PairRDDFunctions.scala | 16 +- .../spark/rdd/ParallelCollectionRDD.scala | 5 +- .../main/scala/org/apache/spark/rdd/RDD.scala | 95 ++++--- .../org/apache/spark/rdd/ShuffledRDD.scala | 2 +- .../org/apache/spark/rdd/SubtractedRDD.scala | 2 +- .../apache/spark/scheduler/DAGScheduler.scala | 118 +++++---- .../spark/scheduler/DAGSchedulerEvent.scala | 24 +- .../spark/scheduler/DAGSchedulerSource.scala | 2 +- .../org/apache/spark/scheduler/JobWaiter.scala | 62 +++-- .../scala/org/apache/spark/scheduler/Pool.scala | 5 +- .../org/apache/spark/scheduler/ResultTask.scala | 44 ++-- .../spark/scheduler/SchedulableBuilder.scala | 3 + .../apache/spark/scheduler/ShuffleMapTask.scala | 46 ++-- .../apache/spark/scheduler/SparkListener.scala | 2 +- .../scala/org/apache/spark/scheduler/Task.scala | 63 ++++- .../apache/spark/scheduler/TaskScheduler.scala | 3 + .../org/apache/spark/scheduler/TaskSet.scala | 4 + .../scheduler/cluster/ClusterScheduler.scala | 37 ++- .../cluster/ClusterTaskSetManager.scala | 81 +++--- .../scheduler/cluster/SchedulerBackend.scala | 6 +- .../cluster/StandaloneClusterMessage.scala | 2 + .../cluster/StandaloneSchedulerBackend.scala | 7 + .../spark/scheduler/local/LocalScheduler.scala | 190 +++++--------- .../scheduler/local/LocalTaskSetManager.scala | 13 +- .../org/apache/spark/CacheManagerSuite.scala | 9 +- .../org/apache/spark/CheckpointSuite.scala | 4 +- .../scala/org/apache/spark/JavaAPISuite.java | 2 +- .../org/apache/spark/JobCancellationSuite.scala | 177 +++++++++++++ .../apache/spark/rdd/AsyncRDDActionsSuite.scala | 176 +++++++++++++ .../spark/rdd/PairRDDFunctionsSuite.scala | 2 +- .../spark/scheduler/DAGSchedulerSuite.scala | 14 +- .../spark/scheduler/cluster/FakeTask.scala | 5 +- .../scheduler/local/LocalSchedulerSuite.scala | 28 ++- 51 files changed, 1563 insertions(+), 550 deletions(-) ----------------------------------------------------------------------