Change in the JobManager API

Stephan Ewen Sat, 20 Sep 2014 11:26:02 -0700

Hi!

I have just pushed a big patch to rework the JobManager job and scheduling
classes. It fixes some scalability and robstness issues,
simplifies the task hierarchies, and makes the code ready for some of the
prepared next features (incremental/interactive jobs).


The pull request is https://github.com/apache/incubator-flink/pull/122

What will affect developers that go against the lower level APIs (like the
streaming parts) is the following:

 - No more distrinction between input/intermediate/output tasks
 - Intermediate data sets have a data structure now. This implies that some
methods change slightly (more in name than in meaning).
   In the future, data sets can be consumed many times, but for now, the
network stack supports only one cosumer.
 - The conceptual change that receivers attach senders as inputs (and grab
their outgoing data streams), rather than senders forwarding to
   receivers means that the wiring of JobGraphs is now the other way around.
 - No more distinction between in-memory and network channels. All channels
have always been automatically in-memory, when senders
   and receiver are co-located. The flag was purely a scheduler hint, which
is obsolete now (see below).


Most importantly:
 - The scheduling is a bit different now. Instread of instance sharing, we
now have SlotSharing Groups, which give you
   a way to share resources across tasks, but they behave more dynamic,
which is important for more dynamic environments,
   and when a cluster has less task slots than the parallelism of some
tasks is.
 - For cases that need strict co-location of tasks, we now have
CoLocationConstraints. The Batch API uses them to ensure that
   head, tail, and tasks inside a closed-loop iteration are co-located.

 Stephan

Change in the JobManager API

Reply via email to