[jira] [Commented] (FLINK-34995) flink kafka connector source stuck when partition leader invalid

yansuopeng (Jira) Sun, 07 Apr 2024 00:52:04 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-34995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834629#comment-17834629
 ]


yansuopeng commented on FLINK-34995:
------------------------------------

https://github.com/apache/flink-connector-kafka/pull/91

> flink kafka connector source stuck when partition leader invalid
> ----------------------------------------------------------------
>
>                 Key: FLINK-34995
>                 URL: https://issues.apache.org/jira/browse/FLINK-34995
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / Kafka
>    Affects Versions: 1.17.0, 1.19.0, 1.18.1
>            Reporter: yansuopeng
>            Priority: Major
>              Labels: pull-request-available
>
> when partition leader invalid(leader=-1),  the flink streaming job using 
> KafkaSource can't restart or start a new instance with a new groupid,  it 
> will stuck and got following exception:
> "{*}org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms 
> expired before the position for partition aaa-1 could be determined{*}"
> when leader=-1,  kafka api like KafkaConsumer.position() will block until 
> either the position could be determined or an unrecoverable error is 
> encountered 
> infact,  leader=-1 not easy to avoid,  even replica=3, three disk offline 
> together will trigger the problem, especially when the cluster size is 
> relatively large.    it rely on kafka administrator to fix in time,  but it 
> take risk when in kafka cluster peak period.
> I have solve this problem, and want to create a PR. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-34995) flink kafka connector source stuck when partition leader invalid

Reply via email to