GitHub user uce opened a pull request: https://github.com/apache/flink/pull/754
[FLINK-2119] Add ExecutionGraph support for batch scheduling This PR adds support for a newly introduced scheduling mode `BATCH_FROM_SOURCES`. The goal for me was to make this change *minimally invasive* in order to not touch too much core code shortly before the release. Essentially, this only touches two parts of the codebase: the scheduling action for blocking results and the job vertices. If you set the scheduling mode to `BATCH_FROM_SOURCES`, you can manually configure which input vertices are used as the sources when scheduling (`setAsBatchSource`). You can then manually specify the successor vertices (`addBatchSuccessor`), which are scheduled after the blocking results are finished. When there are no successors specified manually, the result consumers are scheduled as before. Mixing pipelined and blocking results leads to unspecified behaviour currently (aka it's not a good idea to do this at the moment). When you have something like this: ``` O sink | . <------------- denotes a pipelined result O union +----´|`----+ | | | â â â <------- denotes a blocking result O O O src0 src1 src2 ``` You can first first schedule `src0`, `src1`, `src2`, and then continue with the `union-sink` pipeline. ```java src[0].setAsBatchSource(); // src0 is the first to go... src[0].addBatchSuccessors(src[1]); // src0 => src1 src[1].addBatchSuccessors(src[2]); // src1 => src2 src[2].addBatchSuccessors(union); // src2 => [union => sink] ``` @StephanEwen or @tillrohrmann will work on the Optimizer/JobGraph counterpart of this and will build the `JobGraph` for programs in batch mode using the methods introduced in this PR. Do you guys think that this minimal support is sufficient for the first version? (Going over the result partition notification code, I really think it's pressing to refactor it. It is very very hard to understand. The corresponding issue [FLINK-1833](https://issues.apache.org/jira/browse/FLINK-1833) has been created a while back. I want to do this after the release.) You can merge this pull request into a Git repository by running: $ git pull https://github.com/uce/incubator-flink legs-2119 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/754.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #754 ---- commit 4ac15982700257d3deb2d55a389afd0531f7f8be Author: Ufuk Celebi <u...@apache.org> Date: 2015-06-01T21:12:47Z [FLINK-2119] Add ExecutionGraph support for batch scheduling ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---