[jira] [Commented] (FLINK-31655) Adaptive Channel selection for partitioner

tartarus (Jira) Wed, 12 Apr 2023 20:43:06 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-31655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17711663#comment-17711663
 ]


tartarus commented on FLINK-31655:
----------------------------------

Hi [~akalash]  thanks for your reminder!

In our company, adaptive Partitioner is an optional optimization feature that 
is not enabled by default, and is usually enabled for jobs that have external 
heavy IO access. The performance overhead of select Channel is acceptable 
compared to the backpressure and lag caused by high load nodes, and can bring a 
positive gain of about 20%.

I wrote a doc the other day, [Adaptive Channel selection for 
partitioner|https://docs.google.com/document/d/1nH4hma8wyT8IcrwmJNfc9sBc5FXeyTtt96N-ZJ-ZgFk/edit?usp=sharing].
 You are welcome to give some professional advice.

{*}Solution 1{*}：Simple to implement, but each selectChannel needs to be 
traversed, there will be a certain performance overhead; [Already verified 
online, I'm trying to test out the performance overhead at different 
parallelism in a similar way to benchmark.]
{*}Solution 2{*}：almost no performance overhead for select channel, but 
requiring additional operations in the hot path and the need to operate within 
locks, this solution would be more prudent and require more discussion and 
validation.

I looked at your implementation of LoadBasedRecordWriter, If I haven't missed 
any details, {color:#ff8b00}SubpartitionStatistic{color} is only updated at the 
time of emit, this statistic does not represent the true computing power of 
downstream operators. Maybe we want to achieve different goals, what I want to 
solve is to avoid the impact of high load nodes on flink job throughput.

Looking forward to more feedback.

> Adaptive Channel selection for partitioner
> ------------------------------------------
>
>                 Key: FLINK-31655
>                 URL: https://issues.apache.org/jira/browse/FLINK-31655
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Task
>            Reporter: tartarus
>            Assignee: tartarus
>            Priority: Major
>
> In Flink, if the upstream and downstream operator parallelism is not the 
> same, then by default the RebalancePartitioner will be used to select the 
> target channel.
> In our company, users often use flink to access redis, hbase or other rpc 
> services, If some of the Operators are slow to return requests (for external 
> service reasons), then because Rebalance/Rescale are Round-Robin the Channel 
> selection policy, so the job is easy to backpressure.
> Because the Rebalance/Rescale policy does not care which subtask the data is 
> sent to downstream, so we expect Rebalance/Rescale to refer to the processing 
> power of the downstream subtask when choosing a Channel.
> Send more data to the free subtask, this ensures the best possible throughput 
> of job!
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31655) Adaptive Channel selection for partitioner

Reply via email to