Related JIRA: https://issues.apache.org/jira/browse/BEAM-1197

On Wed, Aug 30, 2017 at 11:28 AM Mingmin Xu <[email protected]> wrote:

> Besides data size, I think data refreshment is the BIG barrier especially
> for streaming jobs. For most cases lookup data set is updated periodically
> when the streaming job is running. I like the idea of SeekableIO, it stil
> can be integrated with ExternalKvStore , as a lower level API.
>
> On Mon, Aug 28, 2017 at 7:32 PM, JingsongLee <[email protected]>
> wrote:
>
> > Yes, the runner can hold the entire side input in the right way.But it
> > will be some waste, in the case of large amounts of data.
> > Best, Jingsong Lee
> >
> >
> ------------------------------------------------------------------From:Lukasz
> > Cwik <[email protected]>Time:2017 Aug 25 (Fri) 23:26To:dev <
> > [email protected]>Cc:JingsongLee <[email protected]>Subject:Re:
> > [PROPOSAL] External Join with KV Stores
> > Jinsong, what do you mean by the batch data is too large?
> >
> > To my knowledge, nothing requires an SDK/runner to hold the entire side
> > input in memory. Lists, maps, iterables, ... can all be broken up into
> > smaller segments which can be loaded, cached and discarded separately.
> >
> > On Thu, Aug 24, 2017 at 5:10 PM, Mingmin Xu <[email protected]> wrote:
> >
> > > wanna bring up this thread as we're looking for similar feature in SQL.
> > > --Please point me if something is there, I don't find any JIRA task.
> > >
> > > Now the streaming+batch/batch+batch join is implemented with sideInput.
> > > It's not a one-fit-all rule as Jingsong mentioned, the batch data may
> be
> > > too large, and it would be changed periodically. A userland PTransform
> > > sounds a more straight-forward option, as it doesn't require support in
> > > runner level.
> > >
> > > Mingmin
> > >
> > > On Mon, Jul 17, 2017 at 8:56 PM, JingsongLee <[email protected]>
> > > wrote:
> > >
> > > > Sorry for so long to reply.
> > > > Hi, Aljoscha, I think Async I/O operator and Batch
> > the same, and Async is
> > > > a better interface. All IO-related operations may be more appropriate
> > > >  for asynchronous use. Just like you said, the beginning
> > > > is like no any special support by the Runners.
> > > > I really like Luke's idea, let the user see a SeekableRea
> > > > d + Sideinput interface, and in the runner layer will
> > > > optimize it to the direct access to external
> > > > store. This requires a suitable SeekableRead
> > interface and more efficient
> > > > compiler optimization.
> > > > Kenn's idea is exciting. If we can have an interface similar
> > > >  to FileSystem (Maybe like SeekableRead), abstract
> > and unify a interface
> > > > for multiple of KV stores, we can let users to see only the concept
> > > > of Beam rather than the specific KVStore.
> > > > Best, Jingsong Lee
> > > > ------------------------------------------------------------
> > > ------From:Kenneth
> > > > Knowles <[email protected]>Time:2017 Jul 7 (Fri) 11:43To:dev <
> > > > [email protected]>Cc:JingsongLee <[email protected]
> > >Subject:Re:
> > > > [PROPOSAL] External Join with KV Stores
> > > > In the streams/tables way of talking, side inputs are
> > tables. External KV
> > > > stores are basically also [globally windowed] tables. Both
> > > > are time-varying.
> > > >
> > > > I think it makes perfect sense to access an external
> > KV store in userland
> > > > directly rather than listen to its changelog and
> > reproduce the same table
> > > > as a multimap side input. I'm sure many users are
> > already doing this. I'm
> > > > sure users will always do this. Providing a common interface (simpler
> > > than
> > > > Filesystem) and helpful transform(s) in an extension
> > module seems nice.
> > > > Does it require any support in the core SDK?
> > > >
> > > > If I understand, Luke & Robert, you favor adding
> > metadata to Read/SDF so
> > > > that a user _does_ write it as a changelog listener
> > that is observed as a
> > > > multimap side input, and each runner optimizes it if they can to just
> > > > directly access the KV store? A runner is free to
> > use any kind of storage
> > > > they like to materialize a side input anyhow, so this
> > is surely possible,
> > > > but it is a "sufficiently smart compiler" issue. As for semantics,
> I'm
> > > not
> > > > worried about availability - it is globally windowed and always
> > > available.
> > > > But I think this requires retractions to be correctly equivalent to
> > > direct
> > > > access.
> > > >
> > > > I think we can have a userland PTransform in much
> > less time than a model
> > > > concept, so I favor it.
> > > >
> > > > Kenn
> > > >
> > > >
> > >
> > >
> > > --
> > > ----
> > > Mingmin
> > >
> >
> >
>
>
> --
> ----
> Mingmin
>

Reply via email to