Spark just really isn't a good fit for trying to pin particular computation to a particular executor, especially if you're relying on that for correctness.
On Thu, Mar 16, 2017 at 7:16 AM, OUASSAIDI, Sami <sami.ouassa...@mind7.fr> wrote: > > Hi all, > > So I need to specify how an executor should consume data from a kafka > topic. > > Let's say I have 2 topics : t0 and t1 with two partitions each, and two > executors e0 and e1 (both can be on the same node so assign strategy does > not work since in the case of a multi executor node it works based on round > robin scheduling, whatever first available executor consumes the topic > partition ) > > What I would like to do is make e0 consume partition 0 from both t0 and t1 > while e1 consumes partition 1 from the t0 and t1. Is there no way around it > except messing with scheduling ? If so what's the best approach. > > The reason for doing so is that executors will write to a cassandra > database and since we will be in a parallelized context one executor might > "collide" with another and therefore data will be lost, by assigning a > partition I want to force the executor to process the data sequentially. > > Thanks > Sami > -- > *Mind7 Consulting* > > Sami Ouassaid | Consultant Big Data | sami.ouassa...@mind7.com > __ > > 64 Rue Taitbout, 75009 Paris > ᐧ >