[
https://issues.apache.org/jira/browse/FLINK-31655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17711663#comment-17711663
]
tartarus commented on FLINK-31655:
----------------------------------
Hi [~akalash] thanks for your reminder!
In our company, adaptive Partitioner is an optional optimization feature that
is not enabled by default, and is usually enabled for jobs that have external
heavy IO access. The performance overhead of select Channel is acceptable
compared to the backpressure and lag caused by high load nodes, and can bring a
positive gain of about 20%.
I wrote a doc the other day, [Adaptive Channel selection for
partitioner|https://docs.google.com/document/d/1nH4hma8wyT8IcrwmJNfc9sBc5FXeyTtt96N-ZJ-ZgFk/edit?usp=sharing].
You are welcome to give some professional advice.
{*}Solution 1{*}:Simple to implement, but each selectChannel needs to be
traversed, there will be a certain performance overhead; [Already verified
online, I'm trying to test out the performance overhead at different
parallelism in a similar way to benchmark.]
{*}Solution 2{*}:almost no performance overhead for select channel, but
requiring additional operations in the hot path and the need to operate within
locks, this solution would be more prudent and require more discussion and
validation.
I looked at your implementation of LoadBasedRecordWriter, If I haven't missed
any details, {color:#ff8b00}SubpartitionStatistic{color} is only updated at the
time of emit, this statistic does not represent the true computing power of
downstream operators. Maybe we want to achieve different goals, what I want to
solve is to avoid the impact of high load nodes on flink job throughput.
Looking forward to more feedback.
> Adaptive Channel selection for partitioner
> ------------------------------------------
>
> Key: FLINK-31655
> URL: https://issues.apache.org/jira/browse/FLINK-31655
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Task
> Reporter: tartarus
> Assignee: tartarus
> Priority: Major
>
> In Flink, if the upstream and downstream operator parallelism is not the
> same, then by default the RebalancePartitioner will be used to select the
> target channel.
> In our company, users often use flink to access redis, hbase or other rpc
> services, If some of the Operators are slow to return requests (for external
> service reasons), then because Rebalance/Rescale are Round-Robin the Channel
> selection policy, so the job is easy to backpressure.
> Because the Rebalance/Rescale policy does not care which subtask the data is
> sent to downstream, so we expect Rebalance/Rescale to refer to the processing
> power of the downstream subtask when choosing a Channel.
> Send more data to the free subtask, this ensures the best possible throughput
> of job!
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)