[GitHub] [kafka] Johnny-Malizia commented on a change in pull request #8936: KAFKA-10207: Fixed padded timeindex causing premature data deletion

GitBox Tue, 07 Jul 2020 10:56:09 -0700


Johnny-Malizia commented on a change in pull request #8936:
URL: https://github.com/apache/kafka/pull/8936#discussion_r451043580




##########
File path: core/src/main/scala/kafka/log/LogSegment.scala
##########
@@ -87,6 +87,12 @@ class LogSegment private[log] (val log: FileRecords,
       // we will recover the segments above the recovery point in recoverLog()
       // in any case so sanity checking them here is redundant.
       txnIndex.sanityCheck()
+      // Failing to sanity check the timeIndex can result in a scenario where 
log segments are
+      // prematurely deleted (before breaching retention periods) if the index 
file was not resized
+      // to disk successfully.
+      // KAFKA-10207
+      timeIndex.sanityCheck()
+      offsetIndex.sanityCheck()

Review comment:
       Thank you for the feedback here. 
   
   While I agree that the trimming logic should be corrected if possible, it 
seems like an issue like this is out of our hands to some degree. The current 
logic seems to be sound and works on a newer version of the jvm. The issue was 
related to reducing the mmaped file's length not actually happening and failing 
silently.  I'm open to suggestions to work around this, but even working around 
the issue for this specific jvm and this specific version of zfs it seems 
plausible this same issue could crop up again in another jvm or another storage 
driver whereas checking at read time catches this particularly nasty issue.
   
   I think that a good compromise here would be to follow through as you and 
@dhruvilshah3 have suggested and apply the sanity check when the index is first 
loaded as the check itself is cheap enough and we would avoid loading *every* 
segment at startup.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [kafka] Johnny-Malizia commented on a change in pull request #8936: KAFKA-10207: Fixed padded timeindex causing premature data deletion

Reply via email to