[ 
https://issues.apache.org/jira/browse/KAFKA-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-5510:
-----------------------------------
    Description: 
Currently, Streams commits only offsets of partitions it did process records 
for. Thus, if a partition does not have any data for longer then 
{{offsets.retention.minutes}} (default 1 day) the latest committed offset get's 
lost. On failure or restart {{auto.offset.rese}} kicks in potentially resulting 
in reprocessing old data.

Thus, Streams should commit _all_ offset on a regular basis. Not sure what the 
overhead of a commit is -- if it's too expensive to commit all offsets on 
regular commit, we could also have a second config that specifies an 
"commit.all.interval".

This relates to https://issues.apache.org/jira/browse/KAFKA-3806, so we should 
sync to get a solid overall solution.

At the same time, it might be better to change the semantics of 
{{offsets.retention.minutes}} in the first place. It might be better to apply 
this setting only if the consumer group is completely dead (and not on "last 
commit" and "per partition" basis). Thus, this JIRA would be a workaround fix 
if core cannot be changed quickly enough.


  was:
Currently, Streams commits only offsets of partitions it did process records 
for. Thus, if a partition does not have any data for longer then 
{{offsets.retention.minutes}} (default 1 day) the latest committed offset get's 
lost. On failure or restart {{auto.offset.rese}} kicks in potentially resulting 
in reprocessing old data.

Thus, Streams should commit _all_ offset on a regular basis. Not sure what the 
overhead of a commit is -- if it's too expensive to commit all offsets on 
regular commit, we could also have a second config that specifies an 
"commit.all.interval".

This relates to https://issues.apache.org/jira/browse/KAFKA-3806, so we should 
sync to get a solid overall solution.

At the same time, it might be better to change the semantics of 
{{offsets.retention.minutes}} in the first plase.



> Streams should commit all offsets regularly
> -------------------------------------------
>
>                 Key: KAFKA-5510
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5510
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Matthias J. Sax
>
> Currently, Streams commits only offsets of partitions it did process records 
> for. Thus, if a partition does not have any data for longer then 
> {{offsets.retention.minutes}} (default 1 day) the latest committed offset 
> get's lost. On failure or restart {{auto.offset.rese}} kicks in potentially 
> resulting in reprocessing old data.
> Thus, Streams should commit _all_ offset on a regular basis. Not sure what 
> the overhead of a commit is -- if it's too expensive to commit all offsets on 
> regular commit, we could also have a second config that specifies an 
> "commit.all.interval".
> This relates to https://issues.apache.org/jira/browse/KAFKA-3806, so we 
> should sync to get a solid overall solution.
> At the same time, it might be better to change the semantics of 
> {{offsets.retention.minutes}} in the first place. It might be better to apply 
> this setting only if the consumer group is completely dead (and not on "last 
> commit" and "per partition" basis). Thus, this JIRA would be a workaround fix 
> if core cannot be changed quickly enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to