Akhilesh Dubey created KAFKA-13635:
--------------------------------------

             Summary: Make Consumer Group Protocol resilient to disk issues 
with __consumer_offsets 
                 Key: KAFKA-13635
                 URL: https://issues.apache.org/jira/browse/KAFKA-13635
             Project: Kafka
          Issue Type: Improvement
            Reporter: Akhilesh Dubey


While working with 6.1.1, we experienced offset reset on some consumer groups 
after a disk full issue (the actual underlying issue was an uncontrolled kafka 
and a machine shutdown).

When the machine and kafka brokers were restarted, consumer applications 
received a {{Found no committed offset for partition <xyz>}} which triggered 
offset reset which in our case was set to earliest - {{{}Resetting offset for 
partition <xyz>{}}}.

On further investigation, we noticed that {{GroupMetadataManager}} silently 
handled an offset load issue. 
ERROR [GroupMetadataManager brokerId=1] Error loading offsets from 
__consumer_offsets-33 (kafka.coordinator.group.GroupMetadataManager)
org.apache.kafka.common.errors.CorruptRecordException: Record size 0 is less 
than the minimum record overhead (14)
There's nothing wrong here as the uncontrolled shutdown and possibly pagecache 
issues could have led to disk flush issues and GroupCoordinator cannot do much 
if the offsets themselves are missing.

I would like to request a feature to stop progress/retry if 
{{__consumer_offsets}} partition fails to load.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to