[ 
https://issues.apache.org/jira/browse/BEAM-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916879#comment-16916879
 ] 

Kyle Winkelman commented on BEAM-2466:
--------------------------------------

Hey everyone,

I have been working on my own POC: 
[https://github.com/kyle-winkelman/beam/tree/kafka-streams]

I'm getting closer. Simple stuff is working. Definitely have a lot of work 
ahead.

Some info:
 * Went for Impulse + SplittableDoFn for reading all sources (plan to do a 
custom implementation for KafkaIO)
 * GroupByKey uses KStream.through to get all KVs with the same key to the same 
nodes then uses KeyedWorkItems and SystemReduceFn to groupAlsoByWindow
 * ParDo uses a Transformer to transform input records and a Punctuator to 
evaluate timers.
 * SideInputs/PCollectionViews will be created by writing the data to a topic 
with the key being the StateNamespace.stringKey() that topic will then be read 
into a GlobalKTable and Materialized into a KeyValueStore. When a ParDo 
accesses the SideInput, we will give the Transformer access to the 
KeyValueStore that has the data. A call to get will look up the value for the 
corresponding window in the SideInput.
 * StateInternals and TimerInternals are materialized in KeyValueStores.

I plan to keep working away at it but if anyone is interested in collaborating 
the help will be much appreciated.

> Add Kafka Streams runner
> ------------------------
>
>                 Key: BEAM-2466
>                 URL: https://issues.apache.org/jira/browse/BEAM-2466
>             Project: Beam
>          Issue Type: Wish
>          Components: runner-ideas
>            Reporter: Lorand Peter Kasler
>            Assignee: Kai Jiang
>            Priority: Minor
>
> Kafka Streams (https://kafka.apache.org/documentation/streams)  has more and 
> more features that could make it a viable candidate for a streaming runner. 
> It uses DataFlow-like model



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to