> On Jan. 28, 2014, 11:16 p.m., Guozhang Wang wrote: > > In all, I would vote for marking the deletion as failed when some brokers > > are in-available, but with this I still feel the delete-topic logic is > > pretty complicated interleaving with other operations. Since in most > > operational scenarios, there should be just one topic operations (adding > > partition, partition reassignment, leader election, delete topic, create > > topic) on-the-fly at the same time, I think it might be better to just > > execute these operations sequentially based on the time the are triggered > > by admin and captured by the controller. Some details of the proposal: > > > > 1) Besides the callback thread from zk-fired events, we can use just one > > more thread to periodically check if any operations are needed (partition > > reassignment, delete topics, etc) and if yes put them in the working queue, > > and another thread keep processing tasks from the queue sequentially. The > > only synchronization we need then is between this background worker thread > > with the controller ZK callback thread. > > > > 2) As for the non-atomicity of some of the operations, since broker failure > > during these operations should be rare, I think it is affordable to simply > > abort-and-retry the operation and hope that the failed broker's replicas > > are already moved to offline and hence no longer block the execution. When > > these brokers restarts, the zk-fired callback thread would decide whether > > the offline replicas on these brokers need to be moved to online or > > non-exist based on the current topic-partition assignment info, and send > > the requests correspondingly. > > > > 3) For controller failover it is fairly the same thing: the current > > in-progress operation would fail, and the new controller would retry this > > operation eventually, although maybe not as the first task since the > > ordering in the queue can change. > > > > 4) One last thing we need to make sure is that all the controller requests > > are idempotent to the brokers, since on controller failover or > > operation-retry they can be sent multiple times. I think this is already > > guaranteed today but we just need to double check.
I think you are suggesting something similar to what I recommended in my description that the cleanup would involve representing the various admin operations in a uniform way and there should be just one background thread working on the admin operations. #2, #3 and #4 are all addressed in the current patch, the only outstanding issue is the extra thread. I think this cleanup is necessary, however the ordering between the zk events and the admin operations in the presence of retries needs to be very carefully thought through. Given that the cleanup itself is pretty tricky, I think it is better to do it in a separate patch instead of combining it with delete topic which is considerably tricky all by itself. > On Jan. 28, 2014, 11:16 p.m., Guozhang Wang wrote: > > core/src/main/scala/kafka/controller/ControllerChannelManager.scala, line > > 203 > > <https://reviews.apache.org/r/17460/diff/1/?file=452943#file452943line203> > > > > We do not need to clear the map here since if they are empty an error > > will be thrown before. This seems a revert of some previous committed patch. Yes, you're right. This was introduced when I was rebasing from trunk > On Jan. 28, 2014, 11:16 p.m., Guozhang Wang wrote: > > core/src/main/scala/kafka/controller/ControllerChannelManager.scala, line > > 210 > > <https://reviews.apache.org/r/17460/diff/1/?file=452943#file452943line210> > > > > the callback parameter is not used in this function. If it is by > > purpose we need to note that in the comments. Yes, I introduced the callbacks in this patch, but didn't intend to fix all the places that could use it. For now, I passed it in all the requests, but only stop replica request ends up using it. I still think it is better to have all the APIs look similar so it is less confusing to understand how callbacks interact with the request processing > On Jan. 28, 2014, 11:16 p.m., Guozhang Wang wrote: > > core/src/main/scala/kafka/controller/KafkaController.scala, line 86 > > <https://reviews.apache.org/r/17460/diff/1/?file=452944#file452944line86> > > > > This seems a revert of some previous committed patch also. Not really. This is on purpose and I have mentioned the reasoning in my previous comments on the patch. > On Jan. 28, 2014, 11:16 p.m., Guozhang Wang wrote: > > core/src/main/scala/kafka/controller/KafkaController.scala, line 162 > > <https://reviews.apache.org/r/17460/diff/1/?file=452944#file452944line162> > > > > Do we need one background thread for each of these three tasks, or we > > can arrange them into one thread? The benefit of doing so is that we do not > > need to synchronize between these threads any more. Agreed. I thought about this and concluded that we need a cleanup of the controller where different tasks are expressed in a uniform way to a single thread. However, this is a pretty big cleanup in itself and I prefer to do it independent of delete topic for simplicity. > On Jan. 28, 2014, 11:16 p.m., Guozhang Wang wrote: > > core/src/main/scala/kafka/controller/KafkaController.scala, line 388 > > <https://reviews.apache.org/r/17460/diff/1/?file=452944#file452944line388> > > > > Would this lead to a dead-lock, meaning the delete-topic will not > > complete until partition-reassignment finishes, while > > partition-reassignment will not resume since it is being deleted? Realized this a few minutes after I uploaded my patch, it was part of my local changes. Fixed it in the newer diff > On Jan. 28, 2014, 11:16 p.m., Guozhang Wang wrote: > > core/src/main/scala/kafka/controller/KafkaController.scala, line 111 > > <https://reviews.apache.org/r/17460/diff/1/?file=452944#file452944line111> > > > > Deleted topics should not be in partitionsBeingReassigned and > > partitionsUndergoingPreferredReplicaElection; if they are, it should be > > treated as an error. Good point. Removed it. - Neha ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17460/#review33056 ----------------------------------------------------------- On Jan. 28, 2014, 11:19 p.m., Neha Narkhede wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/17460/ > ----------------------------------------------------------- > > (Updated Jan. 28, 2014, 11:19 p.m.) > > > Review request for kafka. > > > Bugs: KAFKA-330 > https://issues.apache.org/jira/browse/KAFKA-330 > > > Repository: kafka > > > Description > ------- > > Fixed docs in a few places > > > Fixed the resume logic for partition reassignment to also include topics that > are queued up for deletion, since topic deletetion is halted until partition > reassignment can finish anyway. We need to let partition reassignment finish > (since it started before topic deletion) so that topic deletion can resume > > > Organized imports > > > Moved offline replica handling to controller failover > > > Reading replica assignment from zookeeper instead of local cache > > > Deleting unused APIs > > > Reverting the change to the stop replica request protocol. Instead hacking > around with callbacks > > > All functionality and all unit tests working > > > Rebased with trunk after controller cleanup patch > > > Diffs > ----- > > core/src/main/scala/kafka/admin/AdminUtils.scala > a167756f0fd358574c8ccb42c5c96aaf13def4f5 > core/src/main/scala/kafka/admin/TopicCommand.scala > 842c11047cca0531fbc572fdb25523244ba2b626 > core/src/main/scala/kafka/api/StopReplicaRequest.scala > 820f0f57b00849a588a840358d07f3a4a31772d4 > core/src/main/scala/kafka/api/StopReplicaResponse.scala > d7e36308263aec2298e8adff8f22e18212e33fca > core/src/main/scala/kafka/controller/ControllerChannelManager.scala > ea8485b479155b479c575ebc89a4f73086c872cb > core/src/main/scala/kafka/controller/KafkaController.scala > a0267ae2670e8d5f365e49ec0fa5db1f62b815bf > core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala > fd9200f3bf941aab54df798bb5899eeb552ea3a3 > core/src/main/scala/kafka/controller/PartitionStateMachine.scala > ac4262a403fc73edaecbddf55858703c640b11c0 > core/src/main/scala/kafka/controller/ReplicaStateMachine.scala > 483559aa64726c51320d18b64a1b48f8fb2905a0 > core/src/main/scala/kafka/server/KafkaApis.scala > bd7940b80ca1f1fa4a671c49cf6be1aeec2bbd7e > core/src/main/scala/kafka/server/KafkaHealthcheck.scala > 9dca55c9254948f1196ba17e1d3ebacdcd66be0c > core/src/main/scala/kafka/server/OffsetCheckpoint.scala > b5719f89f79b9f2df4b6cb0f1c869b6eae9f8a7b > core/src/main/scala/kafka/server/ReplicaManager.scala > f9d10d385cee49a1e3be8c82e3ffa22ef87a8fd6 > core/src/main/scala/kafka/server/TopicConfigManager.scala > 42e98dd66f3269e6e3a8210934dabfd65df2dba9 > core/src/main/scala/kafka/server/ZookeeperLeaderElector.scala > b189619bdc1b0d2bba8e8f88467fce014be96ccd > core/src/main/scala/kafka/utils/ZkUtils.scala > b42e52b8e5668383b287b2a86385df65e51b5108 > core/src/test/scala/unit/kafka/admin/AdminTest.scala > 59de1b469fece0b28e1d04dcd7b7015c12576abb > core/src/test/scala/unit/kafka/admin/DeleteTopicTest.scala PRE-CREATION > core/src/test/scala/unit/kafka/api/RequestResponseSerializationTest.scala > 8df0982a1e71e3f50a073c4ae181096d32914f3e > core/src/test/scala/unit/kafka/server/LogOffsetTest.scala > 9aea67b140e50c6f9f1868ce2e2aac9e7530fa77 > core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala > c0475d07a778ff957ad266c08a7a81ea500debd2 > core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala > 03e6266ffdad5891ec81df786bd094066b78b4c0 > core/src/test/scala/unit/kafka/utils/TestUtils.scala > d88b6c3e8fd8098d540998b6a82a65cec8a1dcb0 > > Diff: https://reviews.apache.org/r/17460/diff/ > > > Testing > ------- > > Several integration tests added to test - > > 1. Topic deletion when all replica brokers are alive > 2. Halt and resume topic deletion after a follower replica is restarted > 3. Halt and resume topic deletion after a controller failover > 4. Request handling during topic deletion > 5. Topic deletion and partition reassignment in parallel > 6. Topic deletion and preferred replica election in parallel > 7. Topic deletion and per topic config changes in parallel > > > Thanks, > > Neha Narkhede > >