OmniaGM commented on PR #15230: URL: https://github.com/apache/kafka/pull/15230#issuecomment-1915041492
> Question: why can't we delete these disappeared(deleted) logs directly? I think this can only happen when topic deleting while the node is offline. If so, then deleting them should be fine? The bug only happened when we delete and recreate the topic while the broker is offline and there is overlap on partition assignment between the new and old topic on the offline broker. A better way to explain in it is by the example I mentioned in the Jira where 1. we had offline broker that held replicas for partitions 0, 3, 4, 5, 7, 8, 9 of topic `foo.test` with id `MfuZbwdmSMaiSa0g6__TPg` which we deleted while the broker is offline so they aren't physically deleted. 2. Then before brining the offline broker we recreated `foo.test` again which now have partitions 0, 1, 2, 7, 8, 9 assigned to same offline broker but now the topic id is `RzalpqQ9Q7ub2M2afHxY4Q`. **Notice here that partitions 0, 7, 8, 9 are common between the assignment of the deleted topic (topic_id: MfuZbwdmSMaiSa0g6__TPg) and the recreated topic (topic_id: RzalpqQ9Q7ub2M2afHxY4Q).** 3. When the broker come-back the `ReplicaManager` and `LogManager` will not complain and move one to create the missing partitions' log dir for partitions 1, 2, 6 as there are already dirs for 0, 7, 8, 9 (despite the fact that the topic id is wrong here). 4. Then later `BrokerMetadataPublisher` will detect the strays (which are 0, 7, 8, 9) and deleted them. 5. Now the broker is online but has only partition 1, 2, 6 and we have no other way to bring the broker to create partition 0, 7, 8, 9 without restarting the broker. which lead to permanent under-replication until someone restart the broker. It could get worse on larger clusters if we have more than 1 offline broker and the same issue happened to all of them. We might end up with partition in permeant at-min-isr or permeant under-min-isr. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org