[ 
https://issues.apache.org/jira/browse/KAFKA-12226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-12226:
----------------------------------
    Fix Version/s: 3.0.1
                   3.2.0

> High-throughput source tasks fail to commit offsets
> ---------------------------------------------------
>
>                 Key: KAFKA-12226
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12226
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>            Reporter: Chris Egerton
>            Assignee: Chris Egerton
>            Priority: Major
>             Fix For: 3.1.0, 3.0.1, 3.2.0
>
>
> The current source task thread has the following workflow:
>  # Poll messages from the source task
>  # Queue these messages to the producer and send them to Kafka asynchronously.
>  # Add the message to outstandingMessages, or if a flush is currently active, 
> outstandingMessagesBacklog
>  # When the producer completes the send of a record, remove it from 
> outstandingMessages
> The commit offsets thread has the following workflow:
>  # Wait a flat timeout for outstandingMessages to flush completely
>  # If this times out, add all of the outstandingMessagesBacklog to the 
> outstandingMessages and reset
>  # If it succeeds, commit the source task offsets to the backing store.
>  # Retry the above on a fixed schedule
> If the source task is producing records quickly (faster than the producer can 
> send), then the producer will throttle the task thread by blocking in its 
> {{send}} method, waiting at most {{max.block.ms}} for space in the 
> {{buffer.memory}} to be available. This means that the number of records in 
> {{outstandingMessages}} + {{outstandingMessagesBacklog}} is proportional to 
> the size of the producer memory buffer.
> This amount of data might take more than {{offset.flush.timeout.ms}} to 
> flush, and thus the flush will never succeed while the source task is 
> rate-limited by the producer memory. This means that we may write multiple 
> hours of data to Kafka and not ever commit source offsets for the connector. 
> When the task is lost due to a worker failure, hours of data will be 
> re-processed that otherwise were successfully written to Kafka.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to