Re: [DISCUSS] Flink and Ignite integration

Dmitriy Setrakyan Tue, 28 Apr 2015 16:29:55 -0700

On Tue, Apr 28, 2015 at 5:55 PM, Fabian Hueske <fhue...@gmail.com> wrote:


> Thanks Cos for starting this discussion, hi to the Ignite community!
>
> The probably easiest and most straightforward integration of Flink and
> Ignite would be to go through Ignite's IGFS. Flink can be easily extended
> to support additional filesystems.
>
> However, the Flink community is currently also looking for a solution to
> checkpoint operator state of running stream processing programs. Flink
> processes data streams in real time similar to Storm, i.e., it schedules
> all operators of a streaming program and data is continuously flowing from
> operator to operator. Instead of acknowledging each individual record,
> Flink injects stream offset markers into the stream in regular intervals.
> Whenever, an operator receives such a marker it checkpoints its current
> state (currently to the master with some limitations). In case of a
> failure, the stream is replayed (using a replayable source such as Kafka)
> from the last checkpoint that was not received by all sink operators and
> all operator states are reset to that checkpoint.
> We had already looked at Ignite and were wondering whether Ignite could be
> used to reliably persist the state of streaming operator.
>

Fabian, do you need these checkpoints stored in memory (with optional
redundant copies, or course) or on disk? I think in-memory makes a lot more
sense from performance standpoint, and can easily be done in Ignite.


>
> The other points I mentioned on Twitter are just rough ideas at the moment.
>
> Cheers, Fabian
>
> 2015-04-29 0:23 GMT+02:00 Dmitriy Setrakyan <dsetrak...@apache.org>:
>
> > Thanks Cos.
> >
> > Hello Flink Community.
> >
> > From Ignite standpoint we definitely would be interested in providing
> Flink
> > processing API on top of Ignite Data Grid or IGFS. It would be
> interesting
> > to hear what steps would be required for such integration or if there are
> > other integration points.
> >
> > D.
> >
> > On Tue, Apr 28, 2015 at 2:57 PM, Konstantin Boudnik <c...@apache.org>
> > wrote:
> >
> > > Following the lively exchange in Twitter (sic!) I would like to bring
> > > together
> > > Ignite and Flink communities to discuss the benefits of the integration
> > and
> > > see where we can start it.
> > >
> > > We have this recently opened ticket
> > >   https://issues.apache.org/jira/browse/IGNITE-813
> > >
> > > and Fabian has listed the following points:
> > >
> > >  1) data store
> > >  2) parameter server for ML models
> > >  3) Checkpointing streaming op state
> > >  4) continuously updating views from streams
> > >
> > > I'd add
> > >  5) using Ignite IGFS to speed up Flink's access to HDFS data.
> > >
> > > I see a lot of interesting correlations between two projects and wonder
> > if
> > > Flink guys can step up with a few thoughts on where Flink can benefit
> the
> > > most
> > > from Ignite's in-memory fabric architecture? Perhaps, it can be used as
> > > in-memory storage where the other components of the stack can quickly
> > > access
> > > and work w/ the data w/o a need to dump it back to slow storage?
> > >
> > > Thoughts?
> > >   Cos
> > >
> >
>

Re: [DISCUSS] Flink and Ignite integration

Reply via email to