Re: [DISCUSS] Flink and Ignite integration

Fabian Hueske Wed, 29 Apr 2015 03:29:01 -0700

That's a good question... We are still in the design phase for this feature.


Initially I would have said that replicated in-memory is what we want.
However, Flink is aiming to support long running stream analytics (weeks,
months, ...) and it would be bad if state collected over such a long time
would be lost. So some kind of disk persistence would be good for certain
use cases.



2015-04-29 1:28 GMT+02:00 Dmitriy Setrakyan <dsetrak...@apache.org>:

> On Tue, Apr 28, 2015 at 5:55 PM, Fabian Hueske <fhue...@gmail.com> wrote:
>
> > Thanks Cos for starting this discussion, hi to the Ignite community!
> >
> > The probably easiest and most straightforward integration of Flink and
> > Ignite would be to go through Ignite's IGFS. Flink can be easily extended
> > to support additional filesystems.
> >
> > However, the Flink community is currently also looking for a solution to
> > checkpoint operator state of running stream processing programs. Flink
> > processes data streams in real time similar to Storm, i.e., it schedules
> > all operators of a streaming program and data is continuously flowing
> from
> > operator to operator. Instead of acknowledging each individual record,
> > Flink injects stream offset markers into the stream in regular intervals.
> > Whenever, an operator receives such a marker it checkpoints its current
> > state (currently to the master with some limitations). In case of a
> > failure, the stream is replayed (using a replayable source such as Kafka)
> > from the last checkpoint that was not received by all sink operators and
> > all operator states are reset to that checkpoint.
> > We had already looked at Ignite and were wondering whether Ignite could
> be
> > used to reliably persist the state of streaming operator.
> >
>
> Fabian, do you need these checkpoints stored in memory (with optional
> redundant copies, or course) or on disk? I think in-memory makes a lot more
> sense from performance standpoint, and can easily be done in Ignite.
>
>
> >
> > The other points I mentioned on Twitter are just rough ideas at the
> moment.
> >
> > Cheers, Fabian
> >
> > 2015-04-29 0:23 GMT+02:00 Dmitriy Setrakyan <dsetrak...@apache.org>:
> >
> > > Thanks Cos.
> > >
> > > Hello Flink Community.
> > >
> > > From Ignite standpoint we definitely would be interested in providing
> > Flink
> > > processing API on top of Ignite Data Grid or IGFS. It would be
> > interesting
> > > to hear what steps would be required for such integration or if there
> are
> > > other integration points.
> > >
> > > D.
> > >
> > > On Tue, Apr 28, 2015 at 2:57 PM, Konstantin Boudnik <c...@apache.org>
> > > wrote:
> > >
> > > > Following the lively exchange in Twitter (sic!) I would like to bring
> > > > together
> > > > Ignite and Flink communities to discuss the benefits of the
> integration
> > > and
> > > > see where we can start it.
> > > >
> > > > We have this recently opened ticket
> > > >   https://issues.apache.org/jira/browse/IGNITE-813
> > > >
> > > > and Fabian has listed the following points:
> > > >
> > > >  1) data store
> > > >  2) parameter server for ML models
> > > >  3) Checkpointing streaming op state
> > > >  4) continuously updating views from streams
> > > >
> > > > I'd add
> > > >  5) using Ignite IGFS to speed up Flink's access to HDFS data.
> > > >
> > > > I see a lot of interesting correlations between two projects and
> wonder
> > > if
> > > > Flink guys can step up with a few thoughts on where Flink can benefit
> > the
> > > > most
> > > > from Ignite's in-memory fabric architecture? Perhaps, it can be used
> as
> > > > in-memory storage where the other components of the stack can quickly
> > > > access
> > > > and work w/ the data w/o a need to dump it back to slow storage?
> > > >
> > > > Thoughts?
> > > >   Cos
> > > >
> > >
> >
>

Re: [DISCUSS] Flink and Ignite integration

Reply via email to