Hi Kyle,
this is very cool. For others, here's the link to compare Kyle's branch to
Beam's master[1].

I don't have a lot of experience writing runners, so someone else should
look as well, but I'll try to take a look at your branch in the next few
days, because I think it's pretty cool : )


[1]
https://github.com/apache/beam/compare/master...kyle-winkelman:kafka-streams

On Thu, Sep 24, 2020 at 9:53 AM Kyle Winkelman <[email protected]>
wrote:

> Hello everyone,
>
>
>
> Jira:
>
> https://issues.apache.org/jira/browse/BEAM-2466
>
>
>
> My Branch:
>
> https://github.com/kyle-winkelman/beam/tree/kafka-streams
>
>
>
> I have taken an initial pass at creating a Kafka Streams Runner. It passes
> nearly all of the @ValidatesRunner tests. I am hoping to find someone that
> has experience writing a Runner to take a look at it and give me some
> feedback before I open a PR.
>
>
>
> I am using the Kafka Streams DSL
> <https://kafka.apache.org/26/documentation/streams/developer-guide/dsl-api.html>,
> so a PCollection is equivalent to a KStream
> <https://kafka.apache.org/26/javadoc/org/apache/kafka/streams/kstream/KStream.html>
> .
>
>
>
> ParDo:
>
>   - implements Transformer
> <https://kafka.apache.org/26/javadoc/org/apache/kafka/streams/kstream/Transformer.html>
>  to KStream#transform
> <https://kafka.apache.org/26/javadoc/org/apache/kafka/streams/kstream/KStream.html#transform-org.apache.kafka.streams.kstream.TransformerSupplier-org.apache.kafka.streams.kstream.Named-java.lang.String...->
>
>   - outputs KStream<TupleTag<?>, WindowedValue<?>> then uses a
> KStream#branch
> <https://kafka.apache.org/26/javadoc/org/apache/kafka/streams/kstream/KStream.html#branch-org.apache.kafka.streams.kstream.Predicate...->
>  to
> handle Pardo.MultiOutput
>
>   - schedules a Punctuator
> <https://kafka.apache.org/26/javadoc/org/apache/kafka/streams/processor/Punctuator.html>
>  to
> periodically start/finish bundles
>
>
>
> GroupByKey:
>
>   - KStream#repartition
> <https://kafka.apache.org/26/javadoc/org/apache/kafka/streams/kstream/KStream.html#repartition-->
>  the
> data to groupOnlyByKey
>
>   - groupAlsoByWindow similarly to ParDo, runs ReduceFn instead of DoFn
>
>
>
> Flatten:
>
>   - KStream#merge
> <https://kafka.apache.org/26/javadoc/org/apache/kafka/streams/kstream/KStream.html#merge-org.apache.kafka.streams.kstream.KStream->
>
>
>
> Combine:
>
>   - not as composite, but as primitive GroupByKey/ParDo
>
>
>
> Composite Transforms:
>
>   - inlining (I believe)
>
>
>
> Side Inputs:
>
>   - write data to a topic with the key being the StateNamespace.stringKey()
>
>   - read topic into a GlobalKTable
> <https://kafka.apache.org/26/javadoc/org/apache/kafka/streams/kstream/GlobalKTable.html>
>  and
> materialize it into a KeyValueStore
> <https://kafka.apache.org/26/javadoc/org/apache/kafka/streams/state/KeyValueStore.html>
>
>   - in ParDo, access the KeyValueStore, via the ProcessorContext
> <https://kafka.apache.org/26/javadoc/org/apache/kafka/streams/processor/ProcessorContext.html>,
> and use the WindowFn to get the Side Input Window and its
> StateNamespace.stringKey() to look up the value
>
>
>
> Source API:
>
>   - overridden by SplittableDoFn, via
> Read.SPLITTABLE_DOFN_PREFERRED_RUNNERS
>
>
>
> Impulse:
>
>   - if not exists, create a topic in Kafka and use a KafkaProducer
> <https://kafka.apache.org/26/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html>
>  to
> generate the initial record
>
>   - generate a KStream from the above topic
>
>
>
> SplittableDoFn:
>
>   - overriden by SplittableParDo/SplittableParDoViaKeyedWorkItems
>
>   - extends Stateful Processing, runs ProcessFn instead of DoFn and
> Punctuator fires timers as expected by the ProcessFn
>
>   - write watermarks to a separate topic to be aggregated and read into a
> GlobalKTable and materialize it into a KeyValueStore
>
>
>
> Stateful Processing:
>
>   - StateInternals and TimerInternals are materialized in KeyValueStores
>
>   - extends ParDo, with statefulDoFnRunner and Punctuator advances
> TimerInternals (via watermarks KeyValueStore) and fire timers
>
>
> Thanks,
>
> Kyle Winkelman
>

Reply via email to