Re: Kafka Connect: To much restarting with a SourceConnector with dynamic set of tasks

Randall Hauch Mon, 22 May 2017 13:36:39 -0700

You're not doing anything wrong, but I suspect you're requesting task
reconfiguration more frequently than was originally envisioned, which means
that the current implementation is not as optimal for your case.

I'm not sure how much effort is required to implement this new behavior.
The logic for the standalone worker is pretty straightforward, but the
logic for the distributed worker is going to be much more involved. But we
also need to be careful about changing existing behavior, since it's not
hard to imagine connectors that might expect that all tasks be restarted
when there are any changes to the task configurations. If there's any
potential that this is the case, we'd have to be sure to keep the existing
behavior as the default but to somehow enable the new behavior if desired.

One possibility is to add an overloaded requestTaskReconfiguration(boolean
changedOnly) that specifies whether only changed tasks should be
reconfigured. This way the existing requestTaskReconfiguration() method
could be changed to call requestTaskReconfiguration(false), and then the
implementation has to deal with this.

But again, the bigger challenge is to implement this new behavior in the
DistributedHerder. OTOH, perhaps it's not as complicated as I might guess.

On Tue, May 16, 2017 at 4:57 AM, Per Steffensen <perst...@gmail.com> wrote:

> Hi
>
> Kafka (Connect) 0.10.2.1
>
> I am writing my own SourceConnector. It will communicate with a remote
> server, and continuously calculate the set of tasks that has to be running.
> Each task also makes a connection to the remote server from which it will
> get its data to forward to Kafka.
>
> When the SourceConnector realizes that the set of tasks has to be
> modified, it makes sure taskConfigs-method will return config for the new
> complete set of tasks (likely including tasks that already existed before,
> probably some new tasks, and maybe some of the existing tasks will no
> longer be included). After that the SourceConnector calls
> context.requestTaskReconfiguration. This results in the current instance
> of my SourceConnector and all existing/running tasks gets stopped, a new
> instance of my SourceConnector gets created and all tasks (those that
> existed before and new ones) are started.
>
> It all works nicely, but because my SourceConnector and my SourceTasks has
> to (re)establish connection and (re)initialize the streaming of data, and
> because my set of tasks changes fairly often, and because it very very
> often contains tasks that were also in the set before the change, I end up
> having lots of stop/start of tasks that really just ought to continue
> running.
>
> Any plans on making this more delta-ish, so that when doing a
> requestTaskReconfiguration
> * Only tasks that were not already in the task-config-set before the
> requestTaskConfiguration are started
> * Only tasks that were in the task-config-set before the
> requestTaskConfiguration, but not in the set after, are stopped
> * Tasks that are in the task-config-set both before and after
> requestTaskConfiguration, are just allowed to keep running, without
> restarting
> * Not so important: Do not create a new instance of the SourceConnector,
> just because it has a changed task-config-set
>
> Or am I doing something wrong in my SourceConnector? Are there a different
> way that I should maintain a dynamic set of tasks?
>
> Thanks!!!
>
> Regards, Per Steffensen
>
>

Re: Kafka Connect: To much restarting with a SourceConnector with dynamic set of tasks

Reply via email to