GitHub user uce opened a pull request:

    https://github.com/apache/flink/pull/754

    [FLINK-2119] Add ExecutionGraph support for batch scheduling

    This PR adds support for a newly introduced scheduling mode 
`BATCH_FROM_SOURCES`. The goal for me was to make this change *minimally 
invasive* in order to not touch too much core code shortly before the release.
    
    Essentially, this only touches two parts of the codebase: the scheduling 
action for blocking results and the job vertices.
    
    If you set the scheduling mode to `BATCH_FROM_SOURCES`, you can manually 
configure which input vertices are used as the sources when scheduling 
(`setAsBatchSource`). You can then manually specify the successor vertices 
(`addBatchSuccessor`), which are scheduled after the blocking results are 
finished. When there are no successors specified manually, the result consumers 
are scheduled as before. Mixing pipelined and blocking results leads to 
unspecified behaviour currently (aka it's not a good idea to do this at the 
moment).
    
    When you have something like this:
    ```
            O sink
            |
            . <------------- denotes a pipelined result
            O union
      +----´|`----+
      |     |     |
      ■     ■     ■ <------- denotes a blocking result
      O     O     O
     src0  src1  src2
    ```
    You can first first schedule `src0`, `src1`, `src2`, and then continue with 
the `union-sink` pipeline.
    
    ```java
    src[0].setAsBatchSource(); // src0 is the first to go...
    
    src[0].addBatchSuccessors(src[1]); // src0 => src1
    
    src[1].addBatchSuccessors(src[2]); // src1 => src2
    
    src[2].addBatchSuccessors(union); // src2 => [union => sink]
    ```
    
    @StephanEwen or @tillrohrmann will work on the Optimizer/JobGraph 
counterpart of this and will build the `JobGraph` for programs in batch mode 
using the methods introduced in this PR. Do you guys think that this minimal 
support is sufficient for the first version?
    
    (Going over the result partition notification code, I really think it's 
pressing to refactor it. It is very very hard to understand. The corresponding 
issue [FLINK-1833](https://issues.apache.org/jira/browse/FLINK-1833) has been 
created a while back. I want to do this after the release.)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/uce/incubator-flink legs-2119

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/754.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #754
    
----
commit 4ac15982700257d3deb2d55a389afd0531f7f8be
Author: Ufuk Celebi <u...@apache.org>
Date:   2015-06-01T21:12:47Z

    [FLINK-2119] Add ExecutionGraph support for batch scheduling

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to