Hi ,
This is Manjeet here working in operative media . I have been working on
confluent kafka for almost 4 years and have made many customized changes for
kafka connect sink and source connectors . I have made changes in kafka code
base as well for our requirement.
There is one feature I have added recently after discussing with our architect
Praveen Manvi which I wanted to discuss with you for larger community usage.
Background :- We are running more than 30 connectors in the operative but each
connector require different machine specification . E.g Kafka connect s3
requires more memory and some of the in house connector require more network
bandwidth ( IO ) and processing power (CPU) . We were getting out of memory in
worker due to one connector . This effected entire processes and we had to
pause this connector.
Issue :- We wanted each connector to run on specific machine (in this case , we
want 3 type of machines memory , cpu and IO).
Existing Solution :- We can start 3 cluster and have specific type of machine
in each cluster but this is difficult to manage.
Pain points :-
1. We have to consistently take care of cluster while starting machine
otherwise it can start in different cluster.
2. We have to change offset storage topic otherwise we will be able to
see across cluster connectors
Issue Proposed :- We specify type of machine in distributed properties of each
worker machine so that when we specify target machine type in connector start ,
It should be able to start task on exactly same type of machines. In this case
we don't have to take care of above pain points . Different type of machine
will be part of same cluster.
Example :- I have 4 workers with type as memory (worker 1), cpu (worker 2) and
IO (worker3 and worker 4 ).
a) We started connector 1 with 2 tasks and specified target machine type
as cpu. It will distribute tasks equally on worker 3 and worker 4.
b) We started connector 2 with 2 task with target machine type as memory
. It will start both task on worker 1.
I have made changes for this feature and it is working fine and we are pushing
to our production cluster in few days.
Please tell if it can be helpful for the larger community.
Thanks,
Manjeet Duhan