There is one more affect of the changes: Since there is no more distinction between input/output vertices and since disconnected flows are also accepted now, the job manager will not reject any more certain graphs that it used to reject.
That is actually desirable, but I think the streaming API made use of that behavior to validate that the programs have at least a connected source and sink. This need checks at a different point now. On Sat, Sep 20, 2014 at 8:25 PM, Stephan Ewen <[email protected]> wrote: > Edit: I have not pushed it, I am about to push ;-) > > Just needed to rebase on the latest master an tests are pending... > > On Sat, Sep 20, 2014 at 8:24 PM, Stephan Ewen <[email protected]> wrote: > >> Hi! >> >> I have just pushed a big patch to rework the JobManager job and >> scheduling classes. It fixes some scalability and robstness issues, >> simplifies the task hierarchies, and makes the code ready for some of the >> prepared next features (incremental/interactive jobs). >> >> The pull request is https://github.com/apache/incubator-flink/pull/122 >> >> What will affect developers that go against the lower level APIs (like >> the streaming parts) is the following: >> >> - No more distrinction between input/intermediate/output tasks >> - Intermediate data sets have a data structure now. This implies that >> some methods change slightly (more in name than in meaning). >> In the future, data sets can be consumed many times, but for now, the >> network stack supports only one cosumer. >> - The conceptual change that receivers attach senders as inputs (and >> grab their outgoing data streams), rather than senders forwarding to >> receivers means that the wiring of JobGraphs is now the other way >> around. >> - No more distinction between in-memory and network channels. All >> channels have always been automatically in-memory, when senders >> and receiver are co-located. The flag was purely a scheduler hint, >> which is obsolete now (see below). >> >> >> Most importantly: >> - The scheduling is a bit different now. Instread of instance sharing, >> we now have SlotSharing Groups, which give you >> a way to share resources across tasks, but they behave more dynamic, >> which is important for more dynamic environments, >> and when a cluster has less task slots than the parallelism of some >> tasks is. >> - For cases that need strict co-location of tasks, we now have >> CoLocationConstraints. The Batch API uses them to ensure that >> head, tail, and tasks inside a closed-loop iteration are co-located. >> >> Stephan >> >> > >
