Kafka connect task assignment Improvement ( New Feature )

Manjeet Duhan Mon, 29 Jul 2019 12:05:17 -0700

Hi ,

This is Manjeet here working in operative media . I have been working on 
confluent kafka for almost 4 years and have made many customized changes for 
kafka connect sink and source connectors . I have made changes in kafka code 
base as well for our requirement.


There is one feature I have added recently after discussing with our architect 
Praveen Manvi which I wanted to discuss with you for larger community usage.

Background  :- We are running more than 30 connectors in the operative but each 
connector require different machine specification . E.g Kafka connect s3 
requires more memory and some of the in house connector require more network 
bandwidth ( IO ) and processing power (CPU) . We were getting out of memory in 
worker due to one connector . This effected entire processes and we had to 
pause this connector.

Issue :- We wanted each connector to run on specific machine (in this case , we 
want 3 type of machines memory , cpu and IO).

Existing Solution :-  We can start 3 cluster and have specific type of machine 
in each cluster but this is difficult to manage.
          Pain points :-

1.       We have to consistently take care of cluster while starting machine 
otherwise it can start in different cluster.

2.       We have to change offset storage topic otherwise we will be able to 
see across cluster connectors

Issue Proposed :-  We specify type of machine in distributed properties of each 
worker machine so that when we specify target machine type in connector start , 
It should be able to start task on exactly same type of machines. In this case 
we don't have to take care of above pain points . Different type of machine 
will be part of same cluster.

Example :- I have 4 workers with type as memory (worker 1), cpu (worker 2) and 
IO (worker3 and worker 4 ).


a)       We started connector 1 with 2 tasks and specified target machine type 
as cpu. It will distribute tasks equally on worker 3 and worker 4.

b)      We started connector 2  with 2 task with target machine type as memory 
. It will start both task on worker 1.

I have made changes for this feature and it is working fine and we are pushing 
to our production cluster in few days.

Please tell if it can be helpful for the larger community.


Thanks,
Manjeet Duhan

Kafka connect task assignment Improvement ( New Feature )

Reply via email to