[ https://issues.apache.org/jira/browse/KAFKA-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102200#comment-16102200 ]
Per Steffensen commented on KAFKA-5505: --------------------------------------- bq. There's been some discussion about more incremental rebalancing, but as you add/remove tasks, there's no way to avoid the fact that to keep the work balanced we may need to stop/start/move some tasks I can handle restart of tasks! It is just a significant overhead if it happens all the time - and it does, as long a all tasks are restarted every time the set of tasks changes. It will not be a problem if it happens "from time to time" due to rebalance. What annoys me most is actually that the connector itself is restarted, when the set of tasks changes - there is no good reason for that at all, as I see it? The problem is that it takes some time before my connector can builds up its set of tasks, after is (re)starts, because it has to talk with other components to get the entire set of tasks. But the connector has to give a set of tasks almost immediately after (re)start, or things will start behaving strange. Therefore my connector has to start out saying that its set of tasks is empty, and then change the set of tasks (calling context.requestTaskReconfiguration) along the way, as it knows about more and more tasks. But when it does so, the connector is restarted itself, and starts over with an empty set of tasks. I makes the process go: connector started -> empty set of tasks -> some tasks -> connector restarted -> empty set of tasks -> some tasks -> connector restarted -> ... I really have to hack to make it work. If we could just make a change where the connector is not restarted, when it changes its set of tasks, it will be a big step. bq. Can you explain why you have task sets changing so frequently? Ohhh, it is a fairly long explanation in my case. But in general I do not have a hard time imagining connectors with a changing set of tasks. I believe you already have a source-connector out-of-the-box that can copy from a relational database table. Imagine that you would like to extend it, to be able to copy all tables of that database, running one task per table. Guess that would be a fairly reasonable extension. If the set of tables change often, the set of tasks of this connector would change often. bq. It's possible that a different way of assigning partitions to tasks might avoid rebalancing all the time. Well I did that for now. Actually I changed it so that I always have exactly one task, and inside that single task, I handle all the stuff that would otherwise be distributed between tasks. My single task, runs one thread per "partition in the source" - basically one thread where I would like to have had one task. It works the same, but it will not scale, because one task has to run on one machine. Being able to split into several tasks, will help scale. One machine will definitely be able to handle one "partition in the source", but it may not be able to handle "all partitions in the source". I could also take this principle and scale to another fixed number (N) of tasks, higher than one. Then task no M (M from 0 to N-1) will handle "partitions in the source" P where hash(id-of-P) modulo N is M. So I have ways around the problem, but I think the requested change would be nice in general, and something people will expect to be available, especially since it is possible to change the set of tasks along the way - I know I was surprised that it did not already work as I request here. > Connect: Do not restart connector and existing tasks on task-set change > ----------------------------------------------------------------------- > > Key: KAFKA-5505 > URL: https://issues.apache.org/jira/browse/KAFKA-5505 > Project: Kafka > Issue Type: Improvement > Components: KafkaConnect > Affects Versions: 0.10.2.1 > Reporter: Per Steffensen > > I am writing a connector with a frequently changing task-set. It is really > not working very well, because the connector and all existing tasks are > restarted when the set of tasks changes. E.g. if the connector is running > with 10 tasks, and an additional task is needed, the connector itself and all > 10 existing tasks are restarted, just to make the 11th task run also. My > tasks have a fairly heavy initialization, making it extra annoying. I would > like to see a change, introducing a "mode", where only new/deleted tasks are > started/stopped when notifying the system that the set of tasks changed > (calling context.requestTaskReconfiguration() - or something similar). > Discussed this issue a little on d...@kafka.apache.org in the thread "Kafka > Connect: To much restarting with a SourceConnector with dynamic set of tasks" -- This message was sent by Atlassian JIRA (v6.4.14#64029)