[ https://issues.apache.org/jira/browse/KAFKA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240608#comment-16240608 ]
Guozhang Wang commented on KAFKA-2758: -------------------------------------- [~jjkoshy] That's a good point. The main motivation for 1) is for services like MM, where a commit request may contains large number of partitions where many of them contains the same offsets; and the hope is to reduce the request size for such scenarios. I'm wondering if this is still a good trade-off with complexity to modify the server-side logic handling commit offset to update the timestamps from this group id (I think that is primarily dependent on how much we can save in practice for network bandwidth). > Improve Offset Commit Behavior > ------------------------------ > > Key: KAFKA-2758 > URL: https://issues.apache.org/jira/browse/KAFKA-2758 > Project: Kafka > Issue Type: Improvement > Components: consumer > Reporter: Guozhang Wang > Labels: newbiee, reliability > > There are two scenarios of offset committing that we can improve: > 1) we can filter the partitions whose committed offset is equal to the > consumed offset, meaning there is no new consumed messages from this > partition and hence we do not need to include this partition in the commit > request. > 2) we can make a commit request right after resetting to a fetch / consume > position either according to the reset policy (e.g. on consumer starting up, > or handling of out of range offset, etc), or through the {code} seek {code} > so that if the consumer fails right after these event, upon recovery it can > restarts from the reset position instead of resetting again: this can lead > to, for example, data loss if we use "largest" as reset policy while there > are new messages coming to the fetching partitions. -- This message was sent by Atlassian JIRA (v6.4.14#64029)