Tomasz Kaszuba created KAFKA-12761:
--------------------------------------
Summary: Consumer offsets are deleted 7 days after last offset
instead of EMPTY status
Key: KAFKA-12761
URL: https://issues.apache.org/jira/browse/KAFKA-12761
Project: Kafka
Issue Type: Bug
Components: core, streams
Affects Versions: 2.7.0, 2.5.0
Reporter: Tomasz Kaszuba
If I understand correctly the following
[KIP-211|https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets]
consumer offsets should only be cleared based on having an Empty status:
{{Empty}}: The field {{current_state_timestamp}} is set to when group last
transitioned to this state. If the group stays in this for
{{offsets.retention.minutes}}, the following offset cleanup scheduled task will
remove all offsets in the group (as explained above).
After a week of not consuming any new messages BUT still connected to the
consumer group I had the consumer offsets deleted on restart of the k8s pod.
{noformat}
2021-05-06 10:10:04.684 INFO 1 --- [ncurred-pattern]
o.a.k.c.c.internals.ConsumerCoordinator : [Consumer
clientId=ieb-x07-baseline-pc-data-storage-incurred-pattern-86c84635-4c96-4941-b440-5ecd4584d3fd-StreamThread-1-consumer,
groupId=ieb-x07-baseline-pc-data-storage-incurred-pattern] Found no committed
offset for partition ieb.publish.baseline_pc.incurred_pattern-0 {noformat}
I looked at what is happening in the the system topic __consumer_offsets and I
see the following:
{noformat}
17138150 2021-04-27 07:14:50
[ieb-x07-baseline-pc-data-storage-due-pattern,ieb.publish.baseline_pc.due_pattern,0]::OffsetAndMetadata(offset=646,
leaderEpoch=Optional.empty, metadata=AQAAAXkOMJr2,
commitTimestamp=1619500490253, expireTimestamp=None)
53670252 2021-05-03 17:44:11
ieb-x07-baseline-pc-data-storage-due-pattern::GroupMetadata(groupId=ieb-x07-baseline-pc-data-storage-due-pattern,
generation=13, protocolType=Some(consumer), currentState=Stable,
members=Map(ieb-x07-baseline-pc-data-storage-due-pattern-a8f21ea3-3bc0-4dc9-b82c-6b88f9c74008-StreamThread-1-consumer-50603fe4-10f7-432e-b306-115329e82b38
->
MemberMetadata(memberId=ieb-x07-baseline-pc-data-storage-due-pattern-a8f21ea3-3bc0-4dc9-b82c-6b88f9c74008-StreamThread-1-consumer-50603fe4-10f7-432e-b306-115329e82b38,
groupInstanceId=Some(null),
clientId=ieb-x07-baseline-pc-data-storage-due-pattern-a8f21ea3-3bc0-4dc9-b82c-6b88f9c74008-StreamThread-1-consumer,
clientHost=/172.23.194.239, sessionTimeoutMs=10000, rebalanceTimeoutMs=300000,
supportedProtocols=List(stream), )))
65226775 2021-05-06 11:56:13
ieb-x07-baseline-pc-data-storage-due-pattern::GroupMetadata(groupId=ieb-x07-baseline-pc-data-storage-due-pattern,
generation=14, protocolType=Some(consumer), currentState=Empty, members=Map())
65226793 2021-05-06 12:10:00
[ieb-x07-baseline-pc-data-storage-due-pattern,ieb.publish.baseline_pc.due_pattern,0]::NULL
65226795 2021-05-06 12:10:03
ieb-x07-baseline-pc-data-storage-due-pattern::GroupMetadata(groupId=ieb-x07-baseline-pc-data-storage-due-pattern,
generation=15, protocolType=Some(consumer), currentState=Stable,
members=Map(ieb-x07-baseline-pc-data-storage-due-pattern-0fb01327-b21e-4be7-851a-9985e381f8b8-StreamThread-1-consumer-efb312c3-9c24-4088-a5e0-563a3d52c944
->
MemberMetadata(memberId=ieb-x07-baseline-pc-data-storage-due-pattern-0fb01327-b21e-4be7-851a-9985e381f8b8-StreamThread-1-consumer-efb312c3-9c24-4088-a5e0-563a3d52c944,
groupInstanceId=Some(null),
clientId=ieb-x07-baseline-pc-data-storage-due-pattern-0fb01327-b21e-4be7-851a-9985e381f8b8-StreamThread-1-consumer,
clientHost=/172.23.193.184, sessionTimeoutMs=10000, rebalanceTimeoutMs=300000,
supportedProtocols=List(stream), )))
65226809 2021-05-06 12:10:09
[ieb-x07-baseline-pc-data-storage-due-pattern,ieb.publish.baseline_pc.due_pattern,0]::OffsetAndMetadata(offset=2,
leaderEpoch=Optional.empty, metadata=AQAAAXlBJ/Sy,
commitTimestamp=1620295809338, expireTimestamp=None)
{noformat}
As you can see the last commited offset was on the 27th of April but the group
still had status "Stable" on the 3rd of May. It transitioned to "Empty" on the
6th of May when the pod was restarted. Following this you can see the tombstone
message set to delete the offsets which corresponds to the streams logs.
(UTC+2).
For me it looks like the cleanup only took the last commit timestamp into
consideration and not the Stable status. Am I misunderstanding how this should
work?
The client is a kafka streams client using version 2.5.0 with EOS turned on and
the broker is 2.7.0.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)