[
https://issues.apache.org/jira/browse/BEAM-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916879#comment-16916879
]
Kyle Winkelman commented on BEAM-2466:
--------------------------------------
Hey everyone,
I have been working on my own POC:
[https://github.com/kyle-winkelman/beam/tree/kafka-streams]
I'm getting closer. Simple stuff is working. Definitely have a lot of work
ahead.
Some info:
* Went for Impulse + SplittableDoFn for reading all sources (plan to do a
custom implementation for KafkaIO)
* GroupByKey uses KStream.through to get all KVs with the same key to the same
nodes then uses KeyedWorkItems and SystemReduceFn to groupAlsoByWindow
* ParDo uses a Transformer to transform input records and a Punctuator to
evaluate timers.
* SideInputs/PCollectionViews will be created by writing the data to a topic
with the key being the StateNamespace.stringKey() that topic will then be read
into a GlobalKTable and Materialized into a KeyValueStore. When a ParDo
accesses the SideInput, we will give the Transformer access to the
KeyValueStore that has the data. A call to get will look up the value for the
corresponding window in the SideInput.
* StateInternals and TimerInternals are materialized in KeyValueStores.
I plan to keep working away at it but if anyone is interested in collaborating
the help will be much appreciated.
> Add Kafka Streams runner
> ------------------------
>
> Key: BEAM-2466
> URL: https://issues.apache.org/jira/browse/BEAM-2466
> Project: Beam
> Issue Type: Wish
> Components: runner-ideas
> Reporter: Lorand Peter Kasler
> Assignee: Kai Jiang
> Priority: Minor
>
> Kafka Streams (https://kafka.apache.org/documentation/streams) has more and
> more features that could make it a viable candidate for a streaming runner.
> It uses DataFlow-like model
--
This message was sent by Atlassian Jira
(v8.3.2#803003)