There is one more affect of the changes: Since there is no more distinction
between input/output vertices and since disconnected flows are also
accepted now, the job manager will not reject any more certain graphs that
it used to reject.

That is actually desirable, but I think the streaming API made use of that
behavior to validate that the programs have at least a connected source and
sink.

This need checks at a different point now.


On Sat, Sep 20, 2014 at 8:25 PM, Stephan Ewen <[email protected]> wrote:

> Edit: I have not pushed it, I am about to push ;-)
>
> Just needed to rebase on the latest master an tests are pending...
>
> On Sat, Sep 20, 2014 at 8:24 PM, Stephan Ewen <[email protected]> wrote:
>
>> Hi!
>>
>> I have just pushed a big patch to rework the JobManager job and
>> scheduling classes. It fixes some scalability and robstness issues,
>> simplifies the task hierarchies, and makes the code ready for some of the
>> prepared next features (incremental/interactive jobs).
>>
>> The pull request is https://github.com/apache/incubator-flink/pull/122
>>
>> What will affect developers that go against the lower level APIs (like
>> the streaming parts) is the following:
>>
>>  - No more distrinction between input/intermediate/output tasks
>>  - Intermediate data sets have a data structure now. This implies that
>> some methods change slightly (more in name than in meaning).
>>    In the future, data sets can be consumed many times, but for now, the
>> network stack supports only one cosumer.
>>  - The conceptual change that receivers attach senders as inputs (and
>> grab their outgoing data streams), rather than senders forwarding to
>>    receivers means that the wiring of JobGraphs is now the other way
>> around.
>>  - No more distinction between in-memory and network channels. All
>> channels have always been automatically in-memory, when senders
>>    and receiver are co-located. The flag was purely a scheduler hint,
>> which is obsolete now (see below).
>>
>>
>> Most importantly:
>>  - The scheduling is a bit different now. Instread of instance sharing,
>> we now have SlotSharing Groups, which give you
>>    a way to share resources across tasks, but they behave more dynamic,
>> which is important for more dynamic environments,
>>    and when a cluster has less task slots than the parallelism of some
>> tasks is.
>>  - For cases that need strict co-location of tasks, we now have
>> CoLocationConstraints. The Batch API uses them to ensure that
>>    head, tail, and tasks inside a closed-loop iteration are co-located.
>>
>>  Stephan
>>
>>
>
>

Reply via email to