[ https://issues.apache.org/jira/browse/KAFKA-12761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tomasz Kaszuba updated KAFKA-12761: ----------------------------------- Description: If I understand correctly the following [KIP-211|https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets] consumer offsets should only be cleared based on having an Empty status: {{Empty}}: The field {{current_state_timestamp}} is set to when group last transitioned to this state. If the group stays in this for {{offsets.retention.minutes}}, the following offset cleanup scheduled task will remove all offsets in the group (as explained above). After a week of not consuming any new messages BUT still connected to the consumer group I had the consumer offsets deleted on restart of the k8s pod. {noformat} 2021-05-06 10:10:04.684 INFO 1 --- [ncurred-pattern] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=ieb-x07-baseline-pc-data-storage-incurred-pattern-86c84635-4c96-4941-b440-5ecd4584d3fd-StreamThread-1-consumer, groupId=ieb-x07-baseline-pc-data-storage-incurred-pattern] Found no committed offset for partition ieb.publish.baseline_pc.incurred_pattern-0 {noformat} I looked at what is happening in the the system topic __consumer_offsets and I see the following: {noformat} 17138150 2021-04-27 07:14:50 [ieb-x07-baseline-pc-data-storage-due-pattern,ieb.publish.baseline_pc.due_pattern,0]::OffsetAndMetadata(offset=646, leaderEpoch=Optional.empty, metadata=AQAAAXkOMJr2, commitTimestamp=1619500490253, expireTimestamp=None) 53670252 2021-05-03 17:44:11 ieb-x07-baseline-pc-data-storage-due-pattern::GroupMetadata(groupId=ieb-x07-baseline-pc-data-storage-due-pattern, generation=13, protocolType=Some(consumer), currentState=Stable, members=Map(ieb-x07-baseline-pc-data-storage-due-pattern-a8f21ea3-3bc0-4dc9-b82c-6b88f9c74008-StreamThread-1-consumer-50603fe4-10f7-432e-b306-115329e82b38 -> MemberMetadata(memberId=ieb-x07-baseline-pc-data-storage-due-pattern-a8f21ea3-3bc0-4dc9-b82c-6b88f9c74008-StreamThread-1-consumer-50603fe4-10f7-432e-b306-115329e82b38, groupInstanceId=Some(null), clientId=ieb-x07-baseline-pc-data-storage-due-pattern-a8f21ea3-3bc0-4dc9-b82c-6b88f9c74008-StreamThread-1-consumer, clientHost=/192.23.194.239, sessionTimeoutMs=10000, rebalanceTimeoutMs=300000, supportedProtocols=List(stream), ))) 65226775 2021-05-06 11:56:13 ieb-x07-baseline-pc-data-storage-due-pattern::GroupMetadata(groupId=ieb-x07-baseline-pc-data-storage-due-pattern, generation=14, protocolType=Some(consumer), currentState=Empty, members=Map()) 65226793 2021-05-06 12:10:00 [ieb-x07-baseline-pc-data-storage-due-pattern,ieb.publish.baseline_pc.due_pattern,0]::NULL 65226795 2021-05-06 12:10:03 ieb-x07-baseline-pc-data-storage-due-pattern::GroupMetadata(groupId=ieb-x07-baseline-pc-data-storage-due-pattern, generation=15, protocolType=Some(consumer), currentState=Stable, members=Map(ieb-x07-baseline-pc-data-storage-due-pattern-0fb01327-b21e-4be7-851a-9985e381f8b8-StreamThread-1-consumer-efb312c3-9c24-4088-a5e0-563a3d52c944 -> MemberMetadata(memberId=ieb-x07-baseline-pc-data-storage-due-pattern-0fb01327-b21e-4be7-851a-9985e381f8b8-StreamThread-1-consumer-efb312c3-9c24-4088-a5e0-563a3d52c944, groupInstanceId=Some(null), clientId=ieb-x07-baseline-pc-data-storage-due-pattern-0fb01327-b21e-4be7-851a-9985e381f8b8-StreamThread-1-consumer, clientHost=/192.23.193.184, sessionTimeoutMs=10000, rebalanceTimeoutMs=300000, supportedProtocols=List(stream), ))) 65226809 2021-05-06 12:10:09 [ieb-x07-baseline-pc-data-storage-due-pattern,ieb.publish.baseline_pc.due_pattern,0]::OffsetAndMetadata(offset=2, leaderEpoch=Optional.empty, metadata=AQAAAXlBJ/Sy, commitTimestamp=1620295809338, expireTimestamp=None) {noformat} As you can see the last commited offset was on the 27th of April but the group still had status "Stable" on the 3rd of May. It transitioned to "Empty" on the 6th of May when the pod was restarted. Following this you can see the tombstone message set to delete the offsets which corresponds to the streams logs. (UTC+2). For me it looks like the cleanup only took the last commit timestamp into consideration and not the Stable status. Am I misunderstanding how this should work? The {{offsets.retention.minutes}} is the default 7 days The client is a kafka streams client using version 2.5.0 with EOS turned on and the broker is 2.7.0. I have also included the client [^log.txt] was: If I understand correctly the following [KIP-211|https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets] consumer offsets should only be cleared based on having an Empty status: {{Empty}}: The field {{current_state_timestamp}} is set to when group last transitioned to this state. If the group stays in this for {{offsets.retention.minutes}}, the following offset cleanup scheduled task will remove all offsets in the group (as explained above). After a week of not consuming any new messages BUT still connected to the consumer group I had the consumer offsets deleted on restart of the k8s pod. {noformat} 2021-05-06 10:10:04.684 INFO 1 --- [ncurred-pattern] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=ieb-x07-baseline-pc-data-storage-incurred-pattern-86c84635-4c96-4941-b440-5ecd4584d3fd-StreamThread-1-consumer, groupId=ieb-x07-baseline-pc-data-storage-incurred-pattern] Found no committed offset for partition ieb.publish.baseline_pc.incurred_pattern-0 {noformat} I looked at what is happening in the the system topic __consumer_offsets and I see the following: {noformat} 17138150 2021-04-27 07:14:50 [ieb-x07-baseline-pc-data-storage-due-pattern,ieb.publish.baseline_pc.due_pattern,0]::OffsetAndMetadata(offset=646, leaderEpoch=Optional.empty, metadata=AQAAAXkOMJr2, commitTimestamp=1619500490253, expireTimestamp=None) 53670252 2021-05-03 17:44:11 ieb-x07-baseline-pc-data-storage-due-pattern::GroupMetadata(groupId=ieb-x07-baseline-pc-data-storage-due-pattern, generation=13, protocolType=Some(consumer), currentState=Stable, members=Map(ieb-x07-baseline-pc-data-storage-due-pattern-a8f21ea3-3bc0-4dc9-b82c-6b88f9c74008-StreamThread-1-consumer-50603fe4-10f7-432e-b306-115329e82b38 -> MemberMetadata(memberId=ieb-x07-baseline-pc-data-storage-due-pattern-a8f21ea3-3bc0-4dc9-b82c-6b88f9c74008-StreamThread-1-consumer-50603fe4-10f7-432e-b306-115329e82b38, groupInstanceId=Some(null), clientId=ieb-x07-baseline-pc-data-storage-due-pattern-a8f21ea3-3bc0-4dc9-b82c-6b88f9c74008-StreamThread-1-consumer, clientHost=/172.23.194.239, sessionTimeoutMs=10000, rebalanceTimeoutMs=300000, supportedProtocols=List(stream), ))) 65226775 2021-05-06 11:56:13 ieb-x07-baseline-pc-data-storage-due-pattern::GroupMetadata(groupId=ieb-x07-baseline-pc-data-storage-due-pattern, generation=14, protocolType=Some(consumer), currentState=Empty, members=Map()) 65226793 2021-05-06 12:10:00 [ieb-x07-baseline-pc-data-storage-due-pattern,ieb.publish.baseline_pc.due_pattern,0]::NULL 65226795 2021-05-06 12:10:03 ieb-x07-baseline-pc-data-storage-due-pattern::GroupMetadata(groupId=ieb-x07-baseline-pc-data-storage-due-pattern, generation=15, protocolType=Some(consumer), currentState=Stable, members=Map(ieb-x07-baseline-pc-data-storage-due-pattern-0fb01327-b21e-4be7-851a-9985e381f8b8-StreamThread-1-consumer-efb312c3-9c24-4088-a5e0-563a3d52c944 -> MemberMetadata(memberId=ieb-x07-baseline-pc-data-storage-due-pattern-0fb01327-b21e-4be7-851a-9985e381f8b8-StreamThread-1-consumer-efb312c3-9c24-4088-a5e0-563a3d52c944, groupInstanceId=Some(null), clientId=ieb-x07-baseline-pc-data-storage-due-pattern-0fb01327-b21e-4be7-851a-9985e381f8b8-StreamThread-1-consumer, clientHost=/172.23.193.184, sessionTimeoutMs=10000, rebalanceTimeoutMs=300000, supportedProtocols=List(stream), ))) 65226809 2021-05-06 12:10:09 [ieb-x07-baseline-pc-data-storage-due-pattern,ieb.publish.baseline_pc.due_pattern,0]::OffsetAndMetadata(offset=2, leaderEpoch=Optional.empty, metadata=AQAAAXlBJ/Sy, commitTimestamp=1620295809338, expireTimestamp=None) {noformat} As you can see the last commited offset was on the 27th of April but the group still had status "Stable" on the 3rd of May. It transitioned to "Empty" on the 6th of May when the pod was restarted. Following this you can see the tombstone message set to delete the offsets which corresponds to the streams logs. (UTC+2). For me it looks like the cleanup only took the last commit timestamp into consideration and not the Stable status. Am I misunderstanding how this should work? The {{offsets.retention.minutes}} is the default 7 days The client is a kafka streams client using version 2.5.0 with EOS turned on and the broker is 2.7.0. I have also included the client [^log.txt] > Consumer offsets are deleted 7 days after last offset commit instead of EMPTY > status > ------------------------------------------------------------------------------------ > > Key: KAFKA-12761 > URL: https://issues.apache.org/jira/browse/KAFKA-12761 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 2.5.0, 2.7.0 > Reporter: Tomasz Kaszuba > Priority: Major > Attachments: log.txt > > > If I understand correctly the following > [KIP-211|https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets] > consumer offsets should only be cleared based on having an Empty status: > {{Empty}}: The field {{current_state_timestamp}} is set to when group last > transitioned to this state. If the group stays in this for > {{offsets.retention.minutes}}, the following offset cleanup scheduled task > will remove all offsets in the group (as explained above). > After a week of not consuming any new messages BUT still connected to the > consumer group I had the consumer offsets deleted on restart of the k8s pod. > {noformat} > 2021-05-06 10:10:04.684 INFO 1 --- [ncurred-pattern] > o.a.k.c.c.internals.ConsumerCoordinator : [Consumer > clientId=ieb-x07-baseline-pc-data-storage-incurred-pattern-86c84635-4c96-4941-b440-5ecd4584d3fd-StreamThread-1-consumer, > groupId=ieb-x07-baseline-pc-data-storage-incurred-pattern] Found no > committed offset for partition ieb.publish.baseline_pc.incurred_pattern-0 > {noformat} > I looked at what is happening in the the system topic __consumer_offsets and > I see the following: > {noformat} > 17138150 2021-04-27 07:14:50 > [ieb-x07-baseline-pc-data-storage-due-pattern,ieb.publish.baseline_pc.due_pattern,0]::OffsetAndMetadata(offset=646, > leaderEpoch=Optional.empty, metadata=AQAAAXkOMJr2, > commitTimestamp=1619500490253, expireTimestamp=None) > 53670252 2021-05-03 17:44:11 > ieb-x07-baseline-pc-data-storage-due-pattern::GroupMetadata(groupId=ieb-x07-baseline-pc-data-storage-due-pattern, > generation=13, protocolType=Some(consumer), currentState=Stable, > members=Map(ieb-x07-baseline-pc-data-storage-due-pattern-a8f21ea3-3bc0-4dc9-b82c-6b88f9c74008-StreamThread-1-consumer-50603fe4-10f7-432e-b306-115329e82b38 > -> > MemberMetadata(memberId=ieb-x07-baseline-pc-data-storage-due-pattern-a8f21ea3-3bc0-4dc9-b82c-6b88f9c74008-StreamThread-1-consumer-50603fe4-10f7-432e-b306-115329e82b38, > groupInstanceId=Some(null), > clientId=ieb-x07-baseline-pc-data-storage-due-pattern-a8f21ea3-3bc0-4dc9-b82c-6b88f9c74008-StreamThread-1-consumer, > clientHost=/192.23.194.239, sessionTimeoutMs=10000, > rebalanceTimeoutMs=300000, supportedProtocols=List(stream), ))) > 65226775 2021-05-06 11:56:13 > ieb-x07-baseline-pc-data-storage-due-pattern::GroupMetadata(groupId=ieb-x07-baseline-pc-data-storage-due-pattern, > generation=14, protocolType=Some(consumer), currentState=Empty, > members=Map()) > 65226793 2021-05-06 12:10:00 > [ieb-x07-baseline-pc-data-storage-due-pattern,ieb.publish.baseline_pc.due_pattern,0]::NULL > 65226795 2021-05-06 12:10:03 > ieb-x07-baseline-pc-data-storage-due-pattern::GroupMetadata(groupId=ieb-x07-baseline-pc-data-storage-due-pattern, > generation=15, protocolType=Some(consumer), currentState=Stable, > members=Map(ieb-x07-baseline-pc-data-storage-due-pattern-0fb01327-b21e-4be7-851a-9985e381f8b8-StreamThread-1-consumer-efb312c3-9c24-4088-a5e0-563a3d52c944 > -> > MemberMetadata(memberId=ieb-x07-baseline-pc-data-storage-due-pattern-0fb01327-b21e-4be7-851a-9985e381f8b8-StreamThread-1-consumer-efb312c3-9c24-4088-a5e0-563a3d52c944, > groupInstanceId=Some(null), > clientId=ieb-x07-baseline-pc-data-storage-due-pattern-0fb01327-b21e-4be7-851a-9985e381f8b8-StreamThread-1-consumer, > clientHost=/192.23.193.184, sessionTimeoutMs=10000, > rebalanceTimeoutMs=300000, supportedProtocols=List(stream), ))) > 65226809 2021-05-06 12:10:09 > [ieb-x07-baseline-pc-data-storage-due-pattern,ieb.publish.baseline_pc.due_pattern,0]::OffsetAndMetadata(offset=2, > leaderEpoch=Optional.empty, metadata=AQAAAXlBJ/Sy, > commitTimestamp=1620295809338, expireTimestamp=None) {noformat} > As you can see the last commited offset was on the 27th of April but the > group still had status "Stable" on the 3rd of May. It transitioned to "Empty" > on the 6th of May when the pod was restarted. Following this you can see the > tombstone message set to delete the offsets which corresponds to the streams > logs. (UTC+2). > For me it looks like the cleanup only took the last commit timestamp into > consideration and not the Stable status. Am I misunderstanding how this > should work? > The {{offsets.retention.minutes}} is the default 7 days > The client is a kafka streams client using version 2.5.0 with EOS turned on > and the broker is 2.7.0. > I have also included the client [^log.txt] -- This message was sent by Atlassian Jira (v8.3.4#803005)