Akhilesh Dubey created KAFKA-13635: -------------------------------------- Summary: Make Consumer Group Protocol resilient to disk issues with __consumer_offsets Key: KAFKA-13635 URL: https://issues.apache.org/jira/browse/KAFKA-13635 Project: Kafka Issue Type: Improvement Reporter: Akhilesh Dubey
While working with 6.1.1, we experienced offset reset on some consumer groups after a disk full issue (the actual underlying issue was an uncontrolled kafka and a machine shutdown). When the machine and kafka brokers were restarted, consumer applications received a {{Found no committed offset for partition <xyz>}} which triggered offset reset which in our case was set to earliest - {{{}Resetting offset for partition <xyz>{}}}. On further investigation, we noticed that {{GroupMetadataManager}} silently handled an offset load issue. ERROR [GroupMetadataManager brokerId=1] Error loading offsets from __consumer_offsets-33 (kafka.coordinator.group.GroupMetadataManager) org.apache.kafka.common.errors.CorruptRecordException: Record size 0 is less than the minimum record overhead (14) There's nothing wrong here as the uncontrolled shutdown and possibly pagecache issues could have led to disk flush issues and GroupCoordinator cannot do much if the offsets themselves are missing. I would like to request a feature to stop progress/retry if {{__consumer_offsets}} partition fails to load. -- This message was sent by Atlassian Jira (v8.20.1#820001)