[ 
https://issues.apache.org/jira/browse/KAFKA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240608#comment-16240608
 ] 

Guozhang Wang commented on KAFKA-2758:
--------------------------------------

[~jjkoshy] That's a good point. The main motivation for 1) is for services like 
MM, where a commit request may contains large number of partitions where many 
of them contains the same offsets; and the hope is to reduce the request size 
for such scenarios. I'm wondering if this is still a good trade-off with 
complexity to modify the server-side logic handling commit offset to update the 
timestamps from this group id (I think that is primarily dependent on how much 
we can save in practice for network bandwidth).

> Improve Offset Commit Behavior
> ------------------------------
>
>                 Key: KAFKA-2758
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2758
>             Project: Kafka
>          Issue Type: Improvement
>          Components: consumer
>            Reporter: Guozhang Wang
>              Labels: newbiee, reliability
>
> There are two scenarios of offset committing that we can improve:
> 1) we can filter the partitions whose committed offset is equal to the 
> consumed offset, meaning there is no new consumed messages from this 
> partition and hence we do not need to include this partition in the commit 
> request.
> 2) we can make a commit request right after resetting to a fetch / consume 
> position either according to the reset policy (e.g. on consumer starting up, 
> or handling of out of range offset, etc), or through the {code} seek {code} 
> so that if the consumer fails right after these event, upon recovery it can 
> restarts from the reset position instead of resetting again: this can lead 
> to, for example, data loss if we use "largest" as reset policy while there 
> are new messages coming to the fetching partitions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to