Hey all,
I'm currently in a situation where I have a single Kafka topic with data
across multiple partitions and covers data from multiple sources. I'm
trying to see if there's a way that I'd be able to accomplish reading from
these different sources as different pipelines and if a Splittable DoFn can
do this.
Basically - what I'd like to do is for a given key on a record, treat this
as a separate pipeline from Kafka:
testPipeline
.apply(
/*
Apply some function here to tell Kafka how to describe how
to split up
the sources that I want to read from
*/
)
.apply("Ready from Kafka", KafkaIO.read(...))
.apply("Remaining Pipeline Omitted for Brevity"
Is it possible to do this? I'm trying to avoid a major architectural change
that would require multiple separate topics by source, however if I can
guarantee that a given key (and it's associated watermark) are treated
separately, that would be ideal.
Any advice or recommendations for a strategy that might work would be
helpful!
Thanks,
Rion