[jira] [Comment Edited] (KAFKA-15609) Corrupted index uploaded to remote tier

Haruki Okada (Jira) Thu, 02 Nov 2023 09:22:44 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782202#comment-17782202
 ]


Haruki Okada edited comment on KAFKA-15609 at 11/2/23 4:12 PM:
---------------------------------------------------------------

I validated the MappedByteBuffer behavior with this Java code: 
[https://gist.github.com/ocadaruma/fc26fc122829c63cb61e14d7fc96896d]

 

When we create two mmaps from the same file, writes to 1st one are always 
visible to 2nd one unless we specify MapMode.PRIVATE.

 

Also, in my understanding, page cache is directly mapped to the mmap area so 
even when we try to read the file with ordinary read() call which is written by 
mmap, the content should be consistent. at least in Linux


was (Author: ocadaruma):
I validated the MappedByteBuffer behavior with this Java code: 
[https://gist.github.com/ocadaruma/fc26fc122829c63cb61e14d7fc96896d]

 

When we create two mmaps from the same file, writes to 1st one are always 
visible to 2nd one unless we specify MapMode.PRIVATE.

> Corrupted index uploaded to remote tier
> ---------------------------------------
>
>                 Key: KAFKA-15609
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15609
>             Project: Kafka
>          Issue Type: Bug
>          Components: Tiered-Storage
>    Affects Versions: 3.6.0
>            Reporter: Divij Vaidya
>            Priority: Minor
>
> While testing Tiered Storage, we have observed corrupt indexes being present 
> in remote tier. One such situation is covered here at 
> https://issues.apache.org/jira/browse/KAFKA-15401. This Jira presents another 
> such possible case of corruption.
> Potential cause of index corruption:
> We want to ensure that the file we are passing to RSM plugin contains all the 
> data which is present in MemoryByteBuffer i.e. we should have flushed the 
> MemoryByteBuffer to the file using force(). In Kafka, when we close a 
> segment, indexes are flushed asynchronously [1]. Hence, it might be possible 
> that when we are passing the file to RSM, the file doesn't contain flushed 
> data. Hence, we may end up uploading indexes which haven't been flushed yet. 
> Ideally, the contract should enforce that we force flush the content of 
> MemoryByteBuffer before we give the file for RSM. This will ensure that 
> indexes are not corrupted/incomplete.
> [1] 
> [https://github.com/apache/kafka/blob/4150595b0a2e0f45f2827cebc60bcb6f6558745d/core/src/main/scala/kafka/log/UnifiedLog.scala#L1613]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (KAFKA-15609) Corrupted index uploaded to remote tier

Reply via email to