> On April 28, 2014, 7:43 p.m., Neha Narkhede wrote:
> > core/src/main/scala/kafka/controller/ControllerChannelManager.scala, line 
> > 210
> > <https://reviews.apache.org/r/20745/diff/1/?file=569160#file569160line210>
> >
> >     better to not leak the logic from delete topic manager here. It is 
> > worth exposing the check as an API in DeleteTopicManager. In any case, can 
> > you explain the motivation behind your change?
> 
> Timothy Chen wrote:
>     Sorry forgot to add more comments to the rb, have been chatting mostly 
> with Jun about my fixes.
>     One test that was failing in delete topic test was a test that was 
> handling request while topic being deleted, and it made sure it can never 
> finish by shutting down one of the follower broker. Sending a fetch request 
> to a topic being deleted returns a NotLeaderForPartition error code instead 
> of expected UnknownTopicOrPartition.
>      
>     The cause of this is that as part of the metadata cache changes in Kafka 
> api, we removed the check if a topic exists in the cache when serving 
> fetching request and now it tries to read message set from the replica 
> manager, and even though a stop replica request with delete partition is 
> already been sent and processed, the partition was somehow still in replica 
> manager and it returns NotLeaderForPartition exception since there is no 
> leader for this topic.
>     
>     I found out that the partition was being recreated in the replica 
> manager, because in the process of retrying deleting the same topic it change 
> the failed replica state into OfflineReplica again that triggers a 
> LeaderAndIsrRequest to all brokers for shrinking the Isr for that replica. 
>     
>     The brokers that received the LeaderAndIsrRequest recreates the partition 
> with getOrCreatePartition, and therefore future fetch requests hits it and 
> throws that error code.
>     
>     I think the correct thing to do is to not send LeaderAndIsrRequest if the 
> topic is being deleted when bring replica to an offline state, as it is not 
> necessary anymore.
>

Could you explain the situation in a bit more details? It would be good to 
include the sequence of LeaderIsr and StopReplica requests that the broker 
sees. In particular, I am not sure why a partition still exists in replica 
manager if a stop replica request with delete partition has already been 
received.


- Jun


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20745/#review41626
-----------------------------------------------------------


On April 29, 2014, 12:08 a.m., Timothy Chen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20745/
> -----------------------------------------------------------
> 
> (Updated April 29, 2014, 12:08 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1397
>     https://issues.apache.org/jira/browse/KAFKA-1397
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> Fix delete topic tests and deadlock
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/controller/ControllerChannelManager.scala 
> 919aeb26f93d2fc34d873cfb3dbfab7235a9d635 
>   core/src/main/scala/kafka/controller/KafkaController.scala 
> 933de9dd324c7086efe6aa610335ef370d9e9c12 
>   core/src/main/scala/kafka/controller/ReplicaStateMachine.scala 
> 0e47dac8cbf65a86d053a3371a18af467afd70ae 
>   core/src/main/scala/kafka/controller/TopicDeletionManager.scala 
> e4bc2439ce1933c7c7571d255464ee678226a6cb 
>   core/src/main/scala/kafka/log/LogManager.scala 
> ac67f081e6219fd2181479e7a2bb88ea6044e6cc 
>   core/src/main/scala/kafka/server/KafkaApis.scala 
> 0b668f230c8556fdf08654ce522a11847d0bf39b 
>   core/src/main/scala/kafka/server/KafkaServer.scala 
> c208f83bed7fb91f07fae42f2b66892e6d46fecc 
>   core/src/test/scala/unit/kafka/admin/DeleteTopicTest.scala 
> 9c29e144bba2c9bafa91941b6ca5c263490693b3 
>   core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 
> b1c4ce9f66aa86c68f2e6988f67546fa63ed1f9b 
> 
> Diff: https://reviews.apache.org/r/20745/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Timothy Chen
> 
>

Reply via email to