Scheduling task slots in round-robin

Gyula Fóra Mon, 13 Jun 2016 03:47:35 -0700

Hey,

The Flink scheduling mechanism has become quite a bit of a pain lately for
us when trying to schedule IO heavy streaming jobs. And by IO heavy I mean
it has a fairly large state that is being continuously updated/read.


The main problem is that the scheduled task slots are not evenly
distributed among the different task managers but usually the first TM
takes as much slots as possibles and the other TMs get much fewer. And
since the job is RocksDB IO bound the uneven load causes a significant
performance penalty.

This is further accentuated during historical runs when we are trying to
"fast-forward" the application. The difference can be quite substantial in
a 3-4 node cluster: with even task distribution the history might run 3
times faster compared to an uneven one.

I was wondering if there was a simple way to modify the scheduler so it
allocates resources in a round-robin fashion. Probably someone has a lot of
experience with this already :) (I'm running 1.0.3 for this job btw)

Cheers,
Gyula

Scheduling task slots in round-robin

Reply via email to