[
https://issues.apache.org/jira/browse/KAFKA-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992140#comment-14992140
]
ASF GitHub Bot commented on KAFKA-2748:
---------------------------------------
Github user asfgit closed the pull request at:
https://github.com/apache/kafka/pull/431
> SinkTasks do not handle rebalances and offset commit properly
> -------------------------------------------------------------
>
> Key: KAFKA-2748
> URL: https://issues.apache.org/jira/browse/KAFKA-2748
> Project: Kafka
> Issue Type: Bug
> Components: copycat
> Reporter: Ewen Cheslack-Postava
> Assignee: Ewen Cheslack-Postava
> Fix For: 0.9.0.0
>
>
> Since the initial SinkTask code was originally written with an early version
> of the new consumer, it wasn't setup to handle rebalances properly. Since we
> recently added the rebalance listener, we can use it to correctly commit
> offsets. However, the existing code also has two issues. First, in the case
> of a failure to flush data in the sink task, we are not correctly rewinding
> to the last committed offsets. We need to do this since we cannot be sure
> what happened to the outstanding data, so we need to reprocess it.
> Second, flushing when stopping was not being handled propertly. The existing
> code was assuming that as part of SinkTask.stop() we would. However, this did
> not make sense since SinkTask.stop() was being invoked before the worker
> thread was stopped, so we could end up committing the wrong offsets. Instead,
> we need to wait for the worker thread to finish whatever it is currently
> doing, do one final flush + commit offsets, and only then invoke stop() to
> allow the task to do final cleanup. This is a bit confusing because stop
> means different things for source and sink tasks since they have pull vs push
> semantics.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)