Hi, This is a very interesting proposal! I read you comment about side inputs and I tend to agree, though I think that side inputs don’t have to be strictly streams. It’s easily possible to imagine a Beam where a side input can be based on an external system and accessing side input simply goes through to the external system. In this world, it would be somewhat hard to reason about side input availability and making sure to only process main input when side-input is available. Though it’s not unsolvable, I think.
What I like about your solution is that it is implementable as a DoFn, without any special support by the Runners. However, I think that in the Flink Runner it should be possible to execute this with the Async I/O operator and therefore get asynchronous accesses to the external system. I also think that this is not always better than batching, though. Best, Aljoscha > On 3. Jul 2017, at 04:36, JingsongLee <[email protected]> wrote: > > Hi all: > In some scenarios, the user needs to query some information from external kv > store in the pipeline.I think we can have a good abstraction that allows > users to get as little as possible with the underlying details.Here is a docs > of this proposal, would like to receive your feedback. > https://docs.google.com/document/d/1B-XnUwXh64lbswRieckU0BxtygSV58hysqZbpZmk03A/edit?usp=sharing > Best, Jingsong Lee >
