[ https://issues.apache.org/jira/browse/KAFKA-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503897#comment-16503897 ]
James Cheng commented on KAFKA-5510: ------------------------------------ KIP-211 should address the issue of offsets disappearing on low-traffic partitions. [https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets] . Not sure when that is going to get into core, though. > Streams should commit all offsets regularly > ------------------------------------------- > > Key: KAFKA-5510 > URL: https://issues.apache.org/jira/browse/KAFKA-5510 > Project: Kafka > Issue Type: Bug > Components: streams > Reporter: Matthias J. Sax > Priority: Major > > Currently, Streams commits only offsets of partitions it did process records > for. Thus, if a partition does not have any data for longer then > {{offsets.retention.minutes}} (default 1 day) the latest committed offset > get's lost. On failure or restart {{auto.offset.rese}} kicks in potentially > resulting in reprocessing old data. > Thus, Streams should commit _all_ offset on a regular basis. Not sure what > the overhead of a commit is -- if it's too expensive to commit all offsets on > regular commit, we could also have a second config that specifies an > "commit.all.interval". > This relates to https://issues.apache.org/jira/browse/KAFKA-3806, so we > should sync to get a solid overall solution. > At the same time, it might be better to change the semantics of > {{offsets.retention.minutes}} in the first place. It might be better to apply > this setting only if the consumer group is completely dead (and not on "last > commit" and "per partition" basis). Thus, this JIRA would be a workaround fix > if core cannot be changed quickly enough. -- This message was sent by Atlassian JIRA (v7.6.3#76005)