Thank you. Really appreciate the information.

Antonio.

On Mon, Nov 22, 2021 at 5:35 PM Robert Bradshaw <[email protected]> wrote:

> This is inherent to the way that SDF operate. Essentially,
>
>     ParDo(someSdfInstance)
>
> gets expanded into
>
>    Map(element -> (element,
> restriction)).shuffle().FlatMap(someSdfInstance.process)
>
> On Mon, Nov 22, 2021 at 5:28 PM Antonio Si <[email protected]> wrote:
> >
> > Thanks Robert. Do you mind pointing me to the code that shuffled and
> assigned to the workers?
> > I trace through the code where it loops through the topic partitions and
> for each, create a KafkaSourceDescriptor
> > and calls ReadFromKafkaDoFn.initialRestriction(). But I couldn't find
> out where it got shuffled and assigned to the workers.
> >
> > Thank you very much.
> >
> > Antonio.
> >
> > On Mon, Nov 22, 2021 at 5:05 PM Robert Bradshaw <[email protected]>
> wrote:
> >>
> >> The source code can be found at
> >>
> https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
> >>
> >> In a nutshell, a full set of (topic, partionIndex) pairs gets shuffled
> >> and assigned to the workers randomly, each of which gets processed by
> >> a SplittableDoFn that emits the actual data on that topic/partition.
> >>
> >> On Mon, Nov 22, 2021 at 12:07 PM Antonio Si <[email protected]>
> wrote:
> >> >
> >> > Hello Beam community,
> >> >
> >> > I am experimenting with the new SDF KafkaIO in a bit more detail. I
> have a quick question. How are the topics and partitions got assigned to
> each Task Manager?
> >> >
> >> > Can someone point me to the code?
> >> >
> >> > Thanks in advance.
> >> >
> >> > Antonio.
>

Reply via email to