[jira] [Comment Edited] (KAFKA-16082) JBOD: Possible dataloss when moving leader partition

Proven Provenzano (Jira) Tue, 09 Jan 2024 12:50:26 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804879#comment-17804879
 ]


Proven Provenzano edited comment on KAFKA-16082 at 1/9/24 8:49 PM:
-------------------------------------------------------------------

For the case of 3:
 
If I understand this correctly, the scenario is that the broker restarts and 
sees that `dir2` is supposed to own `tp0` from the metadata log replay, however 
it doesn't see the log in `dir2` because the failed future replica hasn't been 
renamed and so it will create a new replica for `tp0` in `dir2` and populate it 
with data from other replicas. Can we create a unit test to validate this? It 
may also be possible to reuse the current future replica so long as the broker 
at restart went through a stage where the leader of the partition was moved to 
a different broker. Now it can treat the partition as an out of sync replica 
and do the rename and catch up immediately. Note it cannot do the rename until 
after the partition leadership has been moved away from the broker in case the 
broker again restarts.

 

 


was (Author: JIRAUSER298332):
For the case of 3:
 
If I understand this correctly, the scenario is that the broker restarts and 
sees that `dir2` is supposed to own `tp0` from the metadata log replay, however 
it doesn't see the log in `dir2` because the failed future replica hasn't been 
renamed and so it will create a new replica for `tp0` in `dir2` and populate it 
with data from other replicas. Can we create a unit test to validate this? It 
may also be possible to reuse the current future replica so long as the broker 
at restart went through a stage where the leader of the partition was moved to 
a different broker. Now it can treat the partition as an out of sync replica 
and do the rename and catch up immediately. Note it cannot do the rename until 
after the partition leadership has been moved away from the broker in case the 
broker again restarts.
{quote} {quote}

> JBOD: Possible dataloss when moving leader partition
> ----------------------------------------------------
>
>                 Key: KAFKA-16082
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16082
>             Project: Kafka
>          Issue Type: Bug
>          Components: jbod
>    Affects Versions: 3.7.0
>            Reporter: Proven Provenzano
>            Assignee: Gaurav Narula
>            Priority: Blocker
>             Fix For: 3.7.0
>
>
> There is a possible dataloss scenario
> when using JBOD,
> when moving the partition leader log from one directory to another on the 
> same broker,
> when after the destination log has caught up to the source log and after the 
> broker has sent an update to the partition assignment
> if the broker accepts and commits a new record for the partition and then the 
> broker restarts and the original partition leader log is lost
> then the destination log would not contain the new record.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (KAFKA-16082) JBOD: Possible dataloss when moving leader partition

Reply via email to