[ 
https://issues.apache.org/jira/browse/FLINK-35210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839739#comment-17839739
 ] 

Oleksandr Nitavskyi commented on FLINK-35210:
---------------------------------------------

Thanks [~npfp] for suggestion. I believe what you proposed is often resolve 
with some wrapper around KafkaSource, which could be a layer of indirection to 
do a lot of things, e.g. parallelism config.

Meanwhile could you please elaborate how could bad parallelism lead to the Idle 
tasks? Do you mean the case where Source parallelism is lower than the amount 
of partitions and thus you have Source which consumes nothing and thus you have 
no watermark advancement unless 
[Idleness|https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/datastream/event-time/generating_watermarks/#dealing-with-idle-sources]
 is not configured.

> Give the option to set automatically the parallelism of the KafkaSource to 
> the number of kafka partitions
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-35210
>                 URL: https://issues.apache.org/jira/browse/FLINK-35210
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / Kafka
>            Reporter: Nicolas Perrin
>            Priority: Minor
>
> Currently the setting of the `KafkaSource` Flink's operator parallelism needs 
> to be manually chosen which can leads to highly skewed tasks if the developer 
> doesn't do this job.
> To avoid this issue, I propose to:
> -  retrieve dynamically the number of partitions of the topic using 
> `KafkaConsumer.
> partitionsFor(topic).size()`,
> - set the parallelism of the stream built from the source based on this value.
>  This way there won't be any idle tasks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to