[
https://issues.apache.org/jira/browse/FLINK-30198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17648099#comment-17648099
]
Aitozi commented on FLINK-30198:
--------------------------------
Thanks, all guys, for your input. I agree that vertex-level tuning will be more
complex. I think a pluggable {{VertexParallelismDecider}} is a good choice.
Maybe we can also provide some information about the vertex, eg: the vertex
type {{Calc, Join, Local/Global Aggregate...}} to the interface to let users do
more suitable choices.
> Support AdaptiveBatchScheduler to set per-task size for reducer task
> ---------------------------------------------------------------------
>
> Key: FLINK-30198
> URL: https://issues.apache.org/jira/browse/FLINK-30198
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Reporter: Aitozi
> Priority: Major
>
> When we use AdaptiveBatchScheduler in our case, we found that it can work
> well in most case, but there is a limit that, there is only one global
> parameter for per task data size by
> {{jobmanager.adaptive-batch-scheduler.avg-data-volume-per-task}}.
> However, in a map-reduce architecture, the reducer tasks are usually have
> more complex computation logic such as aggregate/sort/join operators. So I
> think it will be nicer if we can set the reducer and mapper task's data size
> per task individually.
> Then, how to distinguish the reducer task?
> IMO, we can let the parallelism decider know whether the vertex have a hash
> edge inputs. If yes, it should be a reducer task.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)