[ 
https://issues.apache.org/jira/browse/KAFKA-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102200#comment-16102200
 ] 

Per Steffensen commented on KAFKA-5505:
---------------------------------------

bq. There's been some discussion about more incremental rebalancing, but as you 
add/remove tasks, there's no way to avoid the fact that to keep the work 
balanced we may need to stop/start/move some tasks

I can handle restart of tasks! It is just a significant overhead if it happens 
all the time - and it does, as long a all tasks are restarted every time the 
set of tasks changes. It will not be a problem if it happens "from time to 
time" due to rebalance. What annoys me most is actually that the connector 
itself is restarted, when the set of tasks changes - there is no good reason 
for that at all, as I see it? The problem is that it takes some time before my 
connector can builds up its set of tasks, after is (re)starts, because it has 
to talk with other components to get the entire set of tasks. But the connector 
has to give a set of tasks almost immediately after (re)start, or things will 
start behaving strange. Therefore my connector has to start out saying that its 
set of tasks is empty, and then change the set of tasks (calling 
context.requestTaskReconfiguration) along the way, as it knows about more and 
more tasks. But when it does so, the connector is restarted itself, and starts 
over with an empty set of tasks. I makes the process go: connector started -> 
empty set of tasks -> some tasks -> connector restarted -> empty set of tasks 
-> some tasks -> connector restarted -> ... I really have to hack to make it 
work.
If we could just make a change where the connector is not restarted, when it 
changes its set of tasks, it will be a big step.

bq. Can you explain why you have task sets changing so frequently?

Ohhh, it is a fairly long explanation in my case. But in general I do not have 
a hard time imagining connectors with a changing set of tasks. I believe you 
already have a source-connector out-of-the-box that can copy from a relational 
database table. Imagine that you would like to extend it, to be able to copy 
all tables of that database, running one task per table. Guess that would be a 
fairly reasonable extension. If the set of tables change often, the set of 
tasks of this connector would change often.

bq. It's possible that a different way of assigning partitions to tasks might 
avoid rebalancing all the time.

Well I did that for now. Actually I changed it so that I always have exactly 
one task, and inside that single task, I handle all the stuff that would 
otherwise be distributed between tasks. My single task, runs one thread per 
"partition in the source" - basically one thread where I would like to have had 
one task. It works the same, but it will not scale, because one task has to run 
on one machine. Being able to split into several tasks, will help scale. One 
machine will definitely be able to handle one "partition in the source", but it 
may not be able to handle "all partitions in the source".
I could also take this principle and scale to another fixed number (N) of 
tasks, higher than one. Then task no M (M from 0 to N-1) will handle 
"partitions in the source" P where hash(id-of-P) modulo N is M.

So I have ways around the problem, but I think the requested change would be 
nice in general, and something people will expect to be available, especially 
since it is possible to change the set of tasks along the way - I know I was 
surprised that it did not already work as I request here.

> Connect: Do not restart connector and existing tasks on task-set change
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-5505
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5505
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>    Affects Versions: 0.10.2.1
>            Reporter: Per Steffensen
>
> I am writing a connector with a frequently changing task-set. It is really 
> not working very well, because the connector and all existing tasks are 
> restarted when the set of tasks changes. E.g. if the connector is running 
> with 10 tasks, and an additional task is needed, the connector itself and all 
> 10 existing tasks are restarted, just to make the 11th task run also. My 
> tasks have a fairly heavy initialization, making it extra annoying. I would 
> like to see a change, introducing a "mode", where only new/deleted tasks are 
> started/stopped when notifying the system that the set of tasks changed 
> (calling context.requestTaskReconfiguration() - or something similar).
> Discussed this issue a little on d...@kafka.apache.org in the thread "Kafka 
> Connect: To much restarting with a SourceConnector with dynamic set of tasks"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to