[jira] [Commented] (KAFKA-10471) TimeIndex handling may cause data loss in certain back to back failure

Jun Rao (Jira) Tue, 08 Sep 2020 14:40:11 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192492#comment-17192492
 ]


Jun Rao commented on KAFKA-10471:
---------------------------------

[~rshekhar]: Thanks for reporting this. This is a very good finding. If the 
TimeIndex doesn't start in a clean state, it can cause multiple things to go 
wrong afterward.

One way to fix this issue is to check the presence of the clean shutdown file 
in LogManager.loadLogs() before loading each individual log. We  then delete 
the clean shutdown file and pass a clean shutdown flag to Log and use that in 
Log.recoverLog(). That way, in step 4 above, the TimeIndex will be rebuilt 
properly.

> TimeIndex handling may cause data loss in certain back to back failure
> ----------------------------------------------------------------------
>
>                 Key: KAFKA-10471
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10471
>             Project: Kafka
>          Issue Type: Bug
>          Components: core, log
>            Reporter: Rohit Shekhar
>            Priority: Critical
>
> # Active segment for log A going clean shutdown - trim the time index to the 
> latest fill value, set the clean shutdown marker.
>  # Broker restarts, loading logs - no recovery due to clean shutdown marker, 
> log A recovers with the previous active segment as current. It also resized 
> the TimeIndex to the max.
>  #  Before all the log loads, the broker had a hard shutdown causing a clean 
> shutdown marker left as is.
>  #  Broker restarts, log A skips recovery due to the presence of a clean 
> shutdown marker but the TimeIndex file assumes the resized file from the 
> previous instance is all full (it assumes either file is newly created or is 
> full with valid value).
>  # The first append to the active segment will result in roll and TimeIndex 
> will be rolled with the timestamp value of the last valid entry (0)
>  # Segment's largest timestamp gives 0 (this can cause premature deletion of 
> data due to retention.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-10471) TimeIndex handling may cause data loss in certain back to back failure

Reply via email to