[ https://issues.apache.org/jira/browse/KAFKA-12226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Randall Hauch updated KAFKA-12226: ---------------------------------- Fix Version/s: 3.0.1 3.2.0 > High-throughput source tasks fail to commit offsets > --------------------------------------------------- > > Key: KAFKA-12226 > URL: https://issues.apache.org/jira/browse/KAFKA-12226 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect > Reporter: Chris Egerton > Assignee: Chris Egerton > Priority: Major > Fix For: 3.1.0, 3.0.1, 3.2.0 > > > The current source task thread has the following workflow: > # Poll messages from the source task > # Queue these messages to the producer and send them to Kafka asynchronously. > # Add the message to outstandingMessages, or if a flush is currently active, > outstandingMessagesBacklog > # When the producer completes the send of a record, remove it from > outstandingMessages > The commit offsets thread has the following workflow: > # Wait a flat timeout for outstandingMessages to flush completely > # If this times out, add all of the outstandingMessagesBacklog to the > outstandingMessages and reset > # If it succeeds, commit the source task offsets to the backing store. > # Retry the above on a fixed schedule > If the source task is producing records quickly (faster than the producer can > send), then the producer will throttle the task thread by blocking in its > {{send}} method, waiting at most {{max.block.ms}} for space in the > {{buffer.memory}} to be available. This means that the number of records in > {{outstandingMessages}} + {{outstandingMessagesBacklog}} is proportional to > the size of the producer memory buffer. > This amount of data might take more than {{offset.flush.timeout.ms}} to > flush, and thus the flush will never succeed while the source task is > rate-limited by the producer memory. This means that we may write multiple > hours of data to Kafka and not ever commit source offsets for the connector. > When the task is lost due to a worker failure, hours of data will be > re-processed that otherwise were successfully written to Kafka. -- This message was sent by Atlassian Jira (v8.20.1#820001)