Hasil Sharma created KAFKA-19014: ------------------------------------ Summary: Potential race condition in remote-log-reader and remote-log-index-cleaner thread Key: KAFKA-19014 URL: https://issues.apache.org/jira/browse/KAFKA-19014 Project: Kafka Issue Type: Bug Affects Versions: 3.8.1 Reporter: Hasil Sharma
Chain of events: *Thread - 1 remote-log-reader* 1/ Fetches the offsetIndex from the indexCache which internally maps the physical offset index file as MappedByteBuffer. OffsetIndex offsetIndex = indexCache.getIndexEntry(segmentMetadata).offsetIndex(); ([here|https://github.com/apache/kafka/blob/cf7029c0264fd7f7b15c2e98acc874ec8c3403f2/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1772]) *Thread - 2 index cache thread* Entry is marked for cleanup i.e physical offset index file is renamed. *Thread - 3 remote-log-index-cleaner* Physical offset index file is deleted. *Thread - 1 remote-log-reader* Attempts run binary search on the MappedByteBuffer that is mapped to a non-existent file. long upperBoundOffset = offsetIndex.fetchUpperBoundOffset(startOffsetPosition, fetchSize).map(position -> position.offset).orElse(segmentMetadata.endOffset() + 1); ([here|https://github.com/apache/kafka/blob/3.8/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1619]) Results in JVM fatal error (SIGSEV) with stack trace: {code:java} Stack: [0x000072ee9112d000,0x000072ee9122d000], sp=0x000072ee9122b360, free space=1016kNative frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)J 6483 c2 java.nio.DirectByteBuffer.getInt(I)I java.base@17.0.14 (28 bytes) @ 0x000072f23d2f12f1 [0x000072f23d2f12a0+0x0000000000000051]j org.apache.kafka.storage.internals.log.OffsetIndex.relativeOffset(Ljava/nio/ByteBuffer;I)I+5j org.apache.kafka.storage.internals.log.OffsetIndex.parseEntry(Ljava/nio/ByteBuffer;I)Lorg/apache/kafka/storage/internals/log/OffsetPosition;+11j org.apache.kafka.storage.internals.log.OffsetIndex.parseEntry(Ljava/nio/ByteBuffer;I)Lorg/apache/kafka/storage/internals/log/IndexEntry;+3j org.apache.kafka.storage.internals.log.AbstractIndex.binarySearch(Ljava/nio/ByteBuffer;JLorg/apache/kafka/storage/internals/log/IndexSearchType;Lorg/apache/kafka/storage/internals/log/AbstractIndex$SearchResultType;II)I+30j org.apache.kafka.storage.internals.log.AbstractIndex.indexSlotRangeFor(Ljava/nio/ByteBuffer;JLorg/apache/kafka/storage/internals/log/IndexSearchType;Lorg/apache/kafka/storage/internals/log/AbstractIndex$SearchResultType;)I+126j org.apache.kafka.storage.internals.log.AbstractIndex.smallestUpperBoundSlotFor(Ljava/nio/ByteBuffer;JLorg/apache/kafka/storage/internals/log/IndexSearchType;)I+8 {code} As per MappedByteBuffer documentation ([here|https://devdocs.io/openjdk~17/java.base/java/nio/mappedbytebuffer]): All or part of a mapped byte buffer may become inaccessible at any time, for example if the mapped file is truncated. An attempt to access an inaccessible region of a mapped byte buffer will not change the buffer's content and will cause an unspecified exception to be thrown either at the time of the access or at some later time. It is therefore strongly recommended that appropriate precautions be taken to avoid the manipulation of a mapped file by this program, or by a concurrently running program, except to read or write the file's content. -- This message was sent by Atlassian Jira (v8.20.10#820010)