OmniaGM commented on PR #15230:
URL: https://github.com/apache/kafka/pull/15230#issuecomment-1915041492

   > Question: why can't we delete these disappeared(deleted) logs directly? I 
think this can only happen when topic deleting while the node is offline. If 
so, then deleting them should be fine?
   
   The bug only happened when we delete and recreate the topic while the broker 
is offline and there is overlap on partition assignment between the new and old 
topic on the offline broker. 
   
   A better way to explain in it is by the example I mentioned in the Jira 
where 
   1. we had offline broker that held replicas for partitions  0, 3, 4, 5, 7, 
8, 9 of topic `foo.test` with id `MfuZbwdmSMaiSa0g6__TPg` which we deleted 
while the broker is offline so they aren't physically deleted. 
   2. Then before brining the offline broker we recreated `foo.test` again 
which now have partitions 0, 1, 2, 7, 8, 9 assigned to same offline broker but 
now the topic id is `RzalpqQ9Q7ub2M2afHxY4Q`. **Notice here that partitions 0, 
7, 8, 9 are common between the assignment of the deleted topic (topic_id: 
MfuZbwdmSMaiSa0g6__TPg) and the recreated topic (topic_id: 
RzalpqQ9Q7ub2M2afHxY4Q).**
   3. When the broker come-back the `ReplicaManager` and `LogManager` will not 
complain and move one to create the missing partitions' log dir for partitions 
1, 2, 6 as there are already dirs for 0, 7, 8, 9 (despite the fact that the 
topic id is wrong here). 
   4. Then later `BrokerMetadataPublisher` will detect the strays (which are 0, 
7, 8, 9) and deleted them. 
   5. Now the broker is online but has only partition 1, 2, 6 and we have no 
other way to bring the broker to create partition 0, 7, 8, 9 without restarting 
the broker. 
   
   which lead to permanent under-replication until someone restart the broker. 
It could get worse on larger clusters if we have more than 1 offline broker and 
the same issue happened to all of them. We might end up with partition in 
permeant at-min-isr or permeant under-min-isr. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to