[jira] [Commented] (FLINK-22887) Backlog based optimizations for RebalancePartitioner and RescalePartitioner

Anton Kalashnikov (Jira) Mon, 21 Jun 2021 08:09:36 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-22887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366660#comment-17366660
 ]


Anton Kalashnikov commented on FLINK-22887:
-------------------------------------------

Yes, sure. My experiment can be found here - 
[https://github.com/apache/flink/pull/16224] (but it is still a pretty raw 
draft). The main conclusion that information like "bytes/buffers received to 
send" and "bytes/buffers sent" is really possible to collect in a cheap 
way(without extra synchronization or any extra request). It is also possible to 
develop the constant complexity algorithm that can select maybe not the best 
but the good candidate for the sending data. So even my changes which should 
not influence performance much(I have not had the benchmarks yet), show pretty 
impressive results.

> Backlog based optimizations for RebalancePartitioner and RescalePartitioner
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-22887
>                 URL: https://issues.apache.org/jira/browse/FLINK-22887
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Network
>    Affects Versions: 1.13.1
>            Reporter: Jiayi Liao
>            Priority: Major
>
> {\{RebalancePartitioner}} uses round-robin to distribute the records but this 
> may not work as expected, because the environments and the processing ability 
> of the downstream tasks may differ from each other. In such cases, the 
> throughput of the whole job will be limited by the slowest downstream 
> subtask, which is very similar with the "HASH" scenario.
> Instead, after the credit-based mechanism is introduced, we can leverage the 
> {{backlog}} on the sender side to identify the "load" on each receiver side, 
> which help us distribute the data more fairly in {{RebalancePartitioner}} and 
> {{RescalePartitioner}}. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-22887) Backlog based optimizations for RebalancePartitioner and RescalePartitioner

Reply via email to