[ https://issues.apache.org/jira/browse/KAFKA-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992140#comment-14992140 ]
ASF GitHub Bot commented on KAFKA-2748: --------------------------------------- Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/431 > SinkTasks do not handle rebalances and offset commit properly > ------------------------------------------------------------- > > Key: KAFKA-2748 > URL: https://issues.apache.org/jira/browse/KAFKA-2748 > Project: Kafka > Issue Type: Bug > Components: copycat > Reporter: Ewen Cheslack-Postava > Assignee: Ewen Cheslack-Postava > Fix For: 0.9.0.0 > > > Since the initial SinkTask code was originally written with an early version > of the new consumer, it wasn't setup to handle rebalances properly. Since we > recently added the rebalance listener, we can use it to correctly commit > offsets. However, the existing code also has two issues. First, in the case > of a failure to flush data in the sink task, we are not correctly rewinding > to the last committed offsets. We need to do this since we cannot be sure > what happened to the outstanding data, so we need to reprocess it. > Second, flushing when stopping was not being handled propertly. The existing > code was assuming that as part of SinkTask.stop() we would. However, this did > not make sense since SinkTask.stop() was being invoked before the worker > thread was stopped, so we could end up committing the wrong offsets. Instead, > we need to wait for the worker thread to finish whatever it is currently > doing, do one final flush + commit offsets, and only then invoke stop() to > allow the task to do final cleanup. This is a bit confusing because stop > means different things for source and sink tasks since they have pull vs push > semantics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)