Separating Data from Kafka by Keying Strategy in a Kafka Splittable DoFn

Rion Williams Mon, 01 Feb 2021 08:07:15 -0800

Hey all,

I'm currently in a situation where I have a single Kafka topic with data
across multiple partitions and covers data from multiple sources. I'm
trying to see if there's a way that I'd be able to accomplish reading from
these different sources as different pipelines and if a Splittable DoFn can
do this.


Basically - what I'd like to do is for a given key on a record, treat this
as a separate pipeline from Kafka:

testPipeline
    .apply(
        /*
            Apply some function here to tell Kafka how to describe how
to split up
            the sources that I want to read from
         */
    )
    .apply("Ready from Kafka", KafkaIO.read(...))
    .apply("Remaining Pipeline Omitted for Brevity"

Is it possible to do this? I'm trying to avoid a major architectural change
that would require multiple separate topics by source, however if I can
guarantee that a given key (and it's associated watermark) are treated
separately, that would be ideal.

Any advice or recommendations for a strategy that might work would be
helpful!

Thanks,

Rion

Separating Data from Kafka by Keying Strategy in a Kafka Splittable DoFn

Reply via email to