[
https://issues.apache.org/jira/browse/KAFKA-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253655#comment-16253655
]
John Crowley commented on KAFKA-4682:
-------------------------------------
Just found this entry - had previously commented on
https://issues.apache.org/jira/browse/KAFKA-3806
Is it possible to allow the offsets.retention.minutes to be set per groupId (in
a similar way that retention.ms can be set per topic)?
This would allow a fairly short default - 1 day as is current - to remove
abandoned groupId metadata yet allow the user to indicate that a particular
groupId should be handled differently. Example in 3806 was a PubSub using Kafka
as a persistent, reliable store supporting multiple subscribers. Some of the
source data has very low volatility - e.g. next year's holiday calendar for a
company, which probably only changes once a year. A consumer must still poll in
case an error update is posted, but will in the normal case not do a real
commit for 12 months!
> Committed offsets should not be deleted if a consumer is still active
> (KIP-211)
> -------------------------------------------------------------------------------
>
> Key: KAFKA-4682
> URL: https://issues.apache.org/jira/browse/KAFKA-4682
> Project: Kafka
> Issue Type: Bug
> Reporter: James Cheng
> Assignee: Vahid Hashemian
> Labels: kip
>
> Kafka will delete committed offsets that are older than
> offsets.retention.minutes
> If there is an active consumer on a low traffic partition, it is possible
> that Kafka will delete the committed offset for that consumer. Once the
> offset is deleted, a restart or a rebalance of that consumer will cause the
> consumer to not find any committed offset and start consuming from
> earliest/latest (depending on auto.offset.reset). I'm not sure, but a broker
> failover might also cause you to start reading from auto.offset.reset (due to
> broker restart, or coordinator failover).
> I think that Kafka should only delete offsets for inactive consumers. The
> timer should only start after a consumer group goes inactive. For example, if
> a consumer group goes inactive, then after 1 week, delete the offsets for
> that consumer group. This is a solution that [~junrao] mentioned in
> https://issues.apache.org/jira/browse/KAFKA-3806?focusedCommentId=15323521&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15323521
> The current workarounds are to:
> # Commit an offset on every partition you own on a regular basis, making sure
> that it is more frequent than offsets.retention.minutes (a broker-side
> setting that a consumer might not be aware of)
> or
> # Turn the value of offsets.retention.minutes up really really high. You have
> to make sure it is higher than any valid low-traffic rate that you want to
> support. For example, if you want to support a topic where someone produces
> once a month, you would have to set offsetes.retention.mintues to 1 month.
> or
> # Turn on enable.auto.commit (this is essentially #1, but easier to
> implement).
> None of these are ideal.
> #1 can be spammy. It requires your consumers know something about how the
> brokers are configured. Sometimes it is out of your control. Mirrormaker, for
> example, only commits offsets on partitions where it receives data. And it is
> duplication that you need to put into all of your consumers.
> #2 has disk-space impact on the broker (in __consumer_offsets) as well as
> memory-size on the broker (to answer OffsetFetch).
> #3 I think has the potential for message loss (the consumer might commit on
> messages that are not yet fully processed)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)