[ https://issues.apache.org/jira/browse/KAFKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933650#comment-14933650 ]
Guozhang Wang commented on KAFKA-2592: -------------------------------------- The problem is that when we flush the rocksDB / change-log, we did not commit offset on the upstream Kafka at the same time, hence the rocksDB / change-log will go "ahead of time". And when there is a failure in between we are highly likely to have duplicates. When user call commit() or when we commit periodically, we will do all the three flushes together (not atomically yet though). The reason we did not commit upon writing to stores is that we do not know if the current record going through the topology has been finished processing, hence if we should consider it as completed and included in the committed offset or not. > Stop Writing the Change-log in store.put() / delete() for Non-transactional > Store > --------------------------------------------------------------------------------- > > Key: KAFKA-2592 > URL: https://issues.apache.org/jira/browse/KAFKA-2592 > Project: Kafka > Issue Type: Sub-task > Reporter: Guozhang Wang > Assignee: Yasuhiro Matsuda > Fix For: 0.9.0.0 > > > Today we keep a dirty threshold and try to send to change-log in store.put() > / delete() when the threshold has been exceeded. Doing this will largely > increase the likelihood of inconsistent state upon unclean shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)