Hi Stephan / others interested,

I have been working on the flink-mesos integration and there are definitely
some thoughts that I would like to share some thoughts about the
commonalities with the flink-yarn integration. 

* Both flink-mesos and flink-yarn integration as they stand today can be
considered as "coarse-grained" scheduling integrations. This means that the
tasks that are spawned (the task managers) are long-lived.

* MaGuoWei is referring to something (as correctly identified by Stephan)
that I like to call "fine-grained" scheduling integration where, the task
managers are relinquished by the framework when they aren't being utilised
by Flink. This means that when the next job is executed, the job manager
and/or framework will spawn new task managers. This also has an implied
requirement that each taskManager runs one task and is then discarded.

* Coarse-grained scheduling is preferable when we want interactive
(sub-second response) and waiting for a resource offer to be accepted and a
new taskManager JVM spin up time is not acceptable. The downside is that
long running tasks means that it may lead to underutilisation of the shared
cluster.

* Fine-grained scheduling is preferable when a little delay (due to starting
a new taskManager JVM) is acceptable. This means that we will have higher
utilisation of the cluster in a shared setting as resources that aren't
being used are relinquished. But, we need to be a lot more extensive about
this approach. Some of the cases that I can think of are:
  * The jobManager/integration-framework may need to monitor the utilisation
of the taskManagers and kill of taskManagers based on some cool-down
timeout.
  * The taskManagers that are being killed off may have resources that are
needed but other tasks so they can't always be killed off
(files/intermediate results etc). This means that there needs to be some
sort of "are you idle?" handshake that needs to be done.
  * I like "fine-grained" mode but there may need to be a middle ground
where tasks are "coarse-grained" i.e. run multiple operators and once idle
for a certain amount of time, they are reaped/killed-off by the
jobManager/integration-framework.

* Ideally, we would want to isolate the logic (a general scheduler) that
says "get me a slot meeting constraints X" into one module which utilises
another module (Yarn or Mesos) that takes such a request and satisfies the
needs of the former. This idea is sort of inspired from the way this
separation exists in apache spark and seems to work out well.

* I don't know the codebase well enough to say where these things go based
on my reading of the overall architecture of the system, there is nothing
that can't be satisfied by the flink-runtime and it *should* not need any
detailed access to the execution plan. I'll defer this to someone who knows
the internals better.



--
View this message in context: 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/add-some-new-api-to-the-scheduler-in-the-job-manager-tp7153p7448.html
Sent from the Apache Flink Mailing List archive. mailing list archive at 
Nabble.com.

Reply via email to