Ewen Cheslack-Postava created KAFKA-2748:
--------------------------------------------

             Summary: SinkTasks do not handle rebalances and offset commit 
properly
                 Key: KAFKA-2748
                 URL: https://issues.apache.org/jira/browse/KAFKA-2748
             Project: Kafka
          Issue Type: Bug
          Components: copycat
            Reporter: Ewen Cheslack-Postava
            Assignee: Ewen Cheslack-Postava
             Fix For: 0.9.0.0


Since the initial SinkTask code was originally written with an early version of 
the new consumer, it wasn't setup to handle rebalances properly. Since we 
recently added the rebalance listener, we can use it to correctly commit 
offsets. However, the existing code also has two issues. First, in the case of 
a failure to flush data in the sink task, we are not correctly rewinding to the 
last committed offsets. We need to do this since we cannot be sure what 
happened to the outstanding data, so we need to reprocess it. 

Second, flushing when stopping was not being handled propertly. The existing 
code was assuming that as part of SinkTask.stop() we would. However, this did 
not make sense since SinkTask.stop() was being invoked before the worker thread 
was stopped, so we could end up committing the wrong offsets. Instead, we need 
to wait for the worker thread to finish whatever it is currently doing, do one 
final flush + commit offsets, and only then invoke stop() to allow the task to 
do final cleanup. This is a bit confusing because stop means different things 
for source and sink tasks since they have pull vs push semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to