[
https://issues.apache.org/jira/browse/KAFKA-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804879#comment-17804879
]
Proven Provenzano edited comment on KAFKA-16082 at 1/9/24 8:49 PM:
-------------------------------------------------------------------
For the case of 3:
If I understand this correctly, the scenario is that the broker restarts and
sees that `dir2` is supposed to own `tp0` from the metadata log replay, however
it doesn't see the log in `dir2` because the failed future replica hasn't been
renamed and so it will create a new replica for `tp0` in `dir2` and populate it
with data from other replicas. Can we create a unit test to validate this? It
may also be possible to reuse the current future replica so long as the broker
at restart went through a stage where the leader of the partition was moved to
a different broker. Now it can treat the partition as an out of sync replica
and do the rename and catch up immediately. Note it cannot do the rename until
after the partition leadership has been moved away from the broker in case the
broker again restarts.
was (Author: JIRAUSER298332):
For the case of 3:
If I understand this correctly, the scenario is that the broker restarts and
sees that `dir2` is supposed to own `tp0` from the metadata log replay, however
it doesn't see the log in `dir2` because the failed future replica hasn't been
renamed and so it will create a new replica for `tp0` in `dir2` and populate it
with data from other replicas. Can we create a unit test to validate this? It
may also be possible to reuse the current future replica so long as the broker
at restart went through a stage where the leader of the partition was moved to
a different broker. Now it can treat the partition as an out of sync replica
and do the rename and catch up immediately. Note it cannot do the rename until
after the partition leadership has been moved away from the broker in case the
broker again restarts.
{quote} {quote}
> JBOD: Possible dataloss when moving leader partition
> ----------------------------------------------------
>
> Key: KAFKA-16082
> URL: https://issues.apache.org/jira/browse/KAFKA-16082
> Project: Kafka
> Issue Type: Bug
> Components: jbod
> Affects Versions: 3.7.0
> Reporter: Proven Provenzano
> Assignee: Gaurav Narula
> Priority: Blocker
> Fix For: 3.7.0
>
>
> There is a possible dataloss scenario
> when using JBOD,
> when moving the partition leader log from one directory to another on the
> same broker,
> when after the destination log has caught up to the source log and after the
> broker has sent an update to the partition assignment
> if the broker accepts and commits a new record for the partition and then the
> broker restarts and the original partition leader log is lost
> then the destination log would not contain the new record.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)