[jira] [Created] (FLINK-10815) Rethink the rescale operation, can we do it async

Shimin Yang (JIRA) Wed, 07 Nov 2018 08:16:47 -0800

Shimin Yang created FLINK-10815:
-----------------------------------

             Summary: Rethink the rescale operation, can we do it async
                 Key: FLINK-10815
                 URL: https://issues.apache.org/jira/browse/FLINK-10815
             Project: Flink
          Issue Type: Improvement
          Components: ResourceManager, Scheduler
            Reporter: Shimin Yang
            Assignee: Shimin Yang



Currently, the rescale operation is to stop the whole job and restart it with 
different parrellism. But the rescale operation cost a lot and took lots of 
time to recover if the state size is quite big. 

And a long-time rescale might cause other problems like latency increase and 
back pressure. For some circumstances like a streaming computing cloud service, 
users may be very sensitive to latency and resource usage. So it would be 
better to make the rescale a cheaper operation.

I wonder if we could make it an async operation just like checkpoint. But how 
to deal with the keyed state would be a pain in the ass. Currently I just want 
to make some assumption to make things simpler. The asnyc rescale operation can 
only double the parrellism or make it half.

In the scale up circumstance, we can copy the state to the newly created worker 
and change the partitioner of the upstream. The best timing might be get 
notified of checkpoint completed. But we still need to change the partitioner 
of upstream. So the upstream should buffer the result or block the computation 
util the state copy finished. Then make the partitioner to send differnt 
elements with the same key to the same downstream operator.

In the scale down circumstance, we can merge the keyed state of two operators 
and also change the partitioner of upstream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10815) Rethink the rescale operation, can we do it async

Reply via email to