Hi all,

So I need to specify how an executor should consume data from a kafka topic.

Let's say I have 2 topics : t0 and t1 with two partitions each, and two
executors e0 and e1 (both can be on the same node so assign strategy does
not work since in the case of a multi executor node it works based on round
robin scheduling, whatever first available executor consumes the topic
partition )

What I would like to do is make e0 consume partition 0 from both t0 and t1
while e1 consumes partition 1 from the t0 and t1. Is there no way around it
except messing with scheduling ? If so what's the best approach.

The reason for doing so is that executors will write to a cassandra
database and since we will be in a parallelized context one executor might
"collide" with another and therefore data will be lost, by assigning a
partition I want to force the executor to process the data sequentially.

Thanks
Sami
-- 
*Mind7 Consulting*

Sami Ouassaid | Consultant Big Data | sami.ouassa...@mind7.com
__

64 Rue Taitbout, 75009 Paris
ᐧ

Reply via email to