Yes, I was thinking the same thing about side inputs. Our current IOs don't
support "seeking" and we could make HBaseIO/JdbcIO/... become seekable by
key+window which would allow a Runner to optimize the Read + SideInput into
any kind of deferred lookup when its accessed as a side input instead of
loading it all into state. The Runner could interrogate the properties of
the "seekable" IO to see if its compatible with what the user is doing
before performing the optimization. Granted I believe it will be difficult
to express when something becomes available, how to handle updates to the
external store, etc...

What I like about modelling it as seekable IOs + Runner optimization is
that users don't need to change their pipeline to get benefits when they
upgrade to newer versions of Apache Beam.

On Tue, Jul 4, 2017 at 9:48 AM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> Hi,
>
> This is a very interesting proposal! I read you comment about side inputs
> and I tend to agree, though I think that side inputs don’t have to be
> strictly streams. It’s easily possible to imagine a Beam where a side input
> can be based on an external system and accessing side input simply goes
> through to the external system. In this world, it would be somewhat hard to
> reason about side input availability and making sure to only process main
> input when side-input is available. Though it’s not unsolvable, I think.
>
> What I like about your solution is that it is implementable as a DoFn,
> without any special support by the Runners. However, I think that in the
> Flink Runner it should be possible to execute this with the Async I/O
> operator and therefore get asynchronous accesses to the external system. I
> also think that this is not always better than batching, though.
>
> Best,
> Aljoscha
> > On 3. Jul 2017, at 04:36, JingsongLee <lzljs3620...@aliyun.com> wrote:
> >
> > Hi all:
> > In some scenarios, the user needs to query some information from
> external kv store in the pipeline.I think we can have a good abstraction
> that allows users to get as little as possible with the underlying
> details.Here is a docs of this proposal, would like to receive your
> feedback.
> > https://docs.google.com/document/d/1B-XnUwXh64lbswRieckU0BxtygSV58hy
> sqZbpZmk03A/edit?usp=sharing
> > Best, Jingsong Lee
> >
>
>

Reply via email to