[
https://issues.apache.org/jira/browse/FLINK-34096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martijn Visser updated FLINK-34096:
-----------------------------------
Priority: Critical (was: Blocker)
> Upscale of kafka source operator leads to some splits getting lost
> ------------------------------------------------------------------
>
> Key: FLINK-34096
> URL: https://issues.apache.org/jira/browse/FLINK-34096
> Project: Flink
> Issue Type: Bug
> Affects Versions: 1.18.0
> Reporter: Yang LI
> Priority: Critical
> Attachments: image-2024-01-15-15-46-47-104.png,
> image-2024-01-15-15-47-36-509.png, image-2024-01-15-15-48-07-871.png
>
>
> Hello,
> We've been conducting experiments with Autoscaling in Apache Flink version
> 1.18.0 and encountered a bug associated with the Kafka source split.
> The issue manifested in our system as follows: upon experiencing a sudden
> spike in traffic, the autoscaler opted to upscale the Kafka source vertex.
> However, the Kafka source fetcher failed to retrieve all available Kafka
> partitions. Additionally, we observed duplication in source splits. For
> example, taskmanager-1 and taskmanager-4 both fetched the same Kafka
> partition.
> !image-2024-01-15-15-46-47-104.png|width=423,height=381!
> !image-2024-01-15-15-48-07-871.png|width=719,height=329!
> {noformat}
> taskmanager-4-9 2024-01-09 17:59:37 [Source: kafka_source_input_with_kt ->
> Flat Map -> session_valid (5/18)#6] INFO o.a.f.c.b.s.r.SourceReaderBase -
> Adding split(s) to reader: [[Partition: sf-enriched-4, StartingOffset:
> 26084169, StoppingOffset: -9223372036854775808], [Partition: sf-anonymized-8,
> StartingOffset: 46477069, StoppingOffset: -9223372036854775808], [Partition:
> sf-anonymized-9, StartingOffset: 46121324, StoppingOffset:
> -9223372036854775808], [Partition: sf-enriched-5, StartingOffset: 26221751,
> StoppingOffset: -9223372036854775808]]
> taskmanager-4-6 2024-01-09 17:59:37 [Source: kafka_source_input_with_kt ->
> Flat Map -> session_valid (4/18)#6] INFO o.a.f.c.b.s.r.SourceReaderBase -
> Adding split(s) to reader: [[Partition: sf-enriched-4, StartingOffset:
> 26084169, StoppingOffset: -9223372036854775808], [Partition: sf-anonymized-8,
> StartingOffset: 46477069, StoppingOffset: -9223372036854775808], [Partition:
> sf-anonymized-20, StartingOffset: 46211745, StoppingOffset:
> -9223372036854775808], [Partition: sf-anonymized-32, StartingOffset:
> 46340878, StoppingOffset: -9223372036854775808]] {noformat}
> Additional Questions
> * During some other experiments which also lead to kafka partition issues,
> we noticed that the autoscaler attempted to increase the parallelism of the
> source vertex to a value that is not a divisor of the Kafka topic's partition
> count. For example, it recommended a parallelism of 48 when the total
> partition count was 72. In such scenarios:
> *
> ** Does kafka source connector vertex still suppose to works well when its
> parallelism is not divisor of topic's partition count?
> ** If this configuration is not ideal, should there be a mechanism within
> the autoscaler to ensure that the parallelism of the source vertex always
> matches the topic's partition count?
>
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)