Thank you. Really appreciate the information. Antonio.
On Mon, Nov 22, 2021 at 5:35 PM Robert Bradshaw <[email protected]> wrote: > This is inherent to the way that SDF operate. Essentially, > > ParDo(someSdfInstance) > > gets expanded into > > Map(element -> (element, > restriction)).shuffle().FlatMap(someSdfInstance.process) > > On Mon, Nov 22, 2021 at 5:28 PM Antonio Si <[email protected]> wrote: > > > > Thanks Robert. Do you mind pointing me to the code that shuffled and > assigned to the workers? > > I trace through the code where it loops through the topic partitions and > for each, create a KafkaSourceDescriptor > > and calls ReadFromKafkaDoFn.initialRestriction(). But I couldn't find > out where it got shuffled and assigned to the workers. > > > > Thank you very much. > > > > Antonio. > > > > On Mon, Nov 22, 2021 at 5:05 PM Robert Bradshaw <[email protected]> > wrote: > >> > >> The source code can be found at > >> > https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java > >> > >> In a nutshell, a full set of (topic, partionIndex) pairs gets shuffled > >> and assigned to the workers randomly, each of which gets processed by > >> a SplittableDoFn that emits the actual data on that topic/partition. > >> > >> On Mon, Nov 22, 2021 at 12:07 PM Antonio Si <[email protected]> > wrote: > >> > > >> > Hello Beam community, > >> > > >> > I am experimenting with the new SDF KafkaIO in a bit more detail. I > have a quick question. How are the topics and partitions got assigned to > each Task Manager? > >> > > >> > Can someone point me to the code? > >> > > >> > Thanks in advance. > >> > > >> > Antonio. >
