Thanks for pointing that out. I personally prefer that this way it is not necessary to explicitly "close" a DataSet or DataStream with a sink. We need to update the corresponding tests however.
On Sun, Sep 21, 2014 at 6:49 PM, Stephan Ewen <[email protected]> wrote: > There is one more affect of the changes: Since there is no more > distinction between input/output vertices and since disconnected flows are > also accepted now, the job manager will not reject any more certain graphs > that it used to reject. > > That is actually desirable, but I think the streaming API made use of that > behavior to validate that the programs have at least a connected source and > sink. > > This need checks at a different point now. > > > On Sat, Sep 20, 2014 at 8:25 PM, Stephan Ewen <[email protected]> wrote: > >> Edit: I have not pushed it, I am about to push ;-) >> >> Just needed to rebase on the latest master an tests are pending... >> >> On Sat, Sep 20, 2014 at 8:24 PM, Stephan Ewen <[email protected]> wrote: >> >>> Hi! >>> >>> I have just pushed a big patch to rework the JobManager job and >>> scheduling classes. It fixes some scalability and robstness issues, >>> simplifies the task hierarchies, and makes the code ready for some of >>> the prepared next features (incremental/interactive jobs). >>> >>> The pull request is https://github.com/apache/incubator-flink/pull/122 >>> >>> What will affect developers that go against the lower level APIs (like >>> the streaming parts) is the following: >>> >>> - No more distrinction between input/intermediate/output tasks >>> - Intermediate data sets have a data structure now. This implies that >>> some methods change slightly (more in name than in meaning). >>> In the future, data sets can be consumed many times, but for now, the >>> network stack supports only one cosumer. >>> - The conceptual change that receivers attach senders as inputs (and >>> grab their outgoing data streams), rather than senders forwarding to >>> receivers means that the wiring of JobGraphs is now the other way >>> around. >>> - No more distinction between in-memory and network channels. All >>> channels have always been automatically in-memory, when senders >>> and receiver are co-located. The flag was purely a scheduler hint, >>> which is obsolete now (see below). >>> >>> >>> Most importantly: >>> - The scheduling is a bit different now. Instread of instance sharing, >>> we now have SlotSharing Groups, which give you >>> a way to share resources across tasks, but they behave more dynamic, >>> which is important for more dynamic environments, >>> and when a cluster has less task slots than the parallelism of some >>> tasks is. >>> - For cases that need strict co-location of tasks, we now have >>> CoLocationConstraints. The Batch API uses them to ensure that >>> head, tail, and tasks inside a closed-loop iteration are co-located. >>> >>> Stephan >>> >>> >> >> >
