Edit: I have not pushed it, I am about to push ;-) Just needed to rebase on the latest master an tests are pending...
On Sat, Sep 20, 2014 at 8:24 PM, Stephan Ewen <[email protected]> wrote: > Hi! > > I have just pushed a big patch to rework the JobManager job and scheduling > classes. It fixes some scalability and robstness issues, > simplifies the task hierarchies, and makes the code ready for some of the > prepared next features (incremental/interactive jobs). > > The pull request is https://github.com/apache/incubator-flink/pull/122 > > What will affect developers that go against the lower level APIs (like the > streaming parts) is the following: > > - No more distrinction between input/intermediate/output tasks > - Intermediate data sets have a data structure now. This implies that > some methods change slightly (more in name than in meaning). > In the future, data sets can be consumed many times, but for now, the > network stack supports only one cosumer. > - The conceptual change that receivers attach senders as inputs (and grab > their outgoing data streams), rather than senders forwarding to > receivers means that the wiring of JobGraphs is now the other way > around. > - No more distinction between in-memory and network channels. All > channels have always been automatically in-memory, when senders > and receiver are co-located. The flag was purely a scheduler hint, > which is obsolete now (see below). > > > Most importantly: > - The scheduling is a bit different now. Instread of instance sharing, we > now have SlotSharing Groups, which give you > a way to share resources across tasks, but they behave more dynamic, > which is important for more dynamic environments, > and when a cluster has less task slots than the parallelism of some > tasks is. > - For cases that need strict co-location of tasks, we now have > CoLocationConstraints. The Batch API uses them to ensure that > head, tail, and tasks inside a closed-loop iteration are co-located. > > Stephan > >
