On Tue, Apr 28, 2015 at 5:55 PM, Fabian Hueske <fhue...@gmail.com> wrote:
> Thanks Cos for starting this discussion, hi to the Ignite community! > > The probably easiest and most straightforward integration of Flink and > Ignite would be to go through Ignite's IGFS. Flink can be easily extended > to support additional filesystems. > > However, the Flink community is currently also looking for a solution to > checkpoint operator state of running stream processing programs. Flink > processes data streams in real time similar to Storm, i.e., it schedules > all operators of a streaming program and data is continuously flowing from > operator to operator. Instead of acknowledging each individual record, > Flink injects stream offset markers into the stream in regular intervals. > Whenever, an operator receives such a marker it checkpoints its current > state (currently to the master with some limitations). In case of a > failure, the stream is replayed (using a replayable source such as Kafka) > from the last checkpoint that was not received by all sink operators and > all operator states are reset to that checkpoint. > We had already looked at Ignite and were wondering whether Ignite could be > used to reliably persist the state of streaming operator. > Fabian, do you need these checkpoints stored in memory (with optional redundant copies, or course) or on disk? I think in-memory makes a lot more sense from performance standpoint, and can easily be done in Ignite. > > The other points I mentioned on Twitter are just rough ideas at the moment. > > Cheers, Fabian > > 2015-04-29 0:23 GMT+02:00 Dmitriy Setrakyan <dsetrak...@apache.org>: > > > Thanks Cos. > > > > Hello Flink Community. > > > > From Ignite standpoint we definitely would be interested in providing > Flink > > processing API on top of Ignite Data Grid or IGFS. It would be > interesting > > to hear what steps would be required for such integration or if there are > > other integration points. > > > > D. > > > > On Tue, Apr 28, 2015 at 2:57 PM, Konstantin Boudnik <c...@apache.org> > > wrote: > > > > > Following the lively exchange in Twitter (sic!) I would like to bring > > > together > > > Ignite and Flink communities to discuss the benefits of the integration > > and > > > see where we can start it. > > > > > > We have this recently opened ticket > > > https://issues.apache.org/jira/browse/IGNITE-813 > > > > > > and Fabian has listed the following points: > > > > > > 1) data store > > > 2) parameter server for ML models > > > 3) Checkpointing streaming op state > > > 4) continuously updating views from streams > > > > > > I'd add > > > 5) using Ignite IGFS to speed up Flink's access to HDFS data. > > > > > > I see a lot of interesting correlations between two projects and wonder > > if > > > Flink guys can step up with a few thoughts on where Flink can benefit > the > > > most > > > from Ignite's in-memory fabric architecture? Perhaps, it can be used as > > > in-memory storage where the other components of the stack can quickly > > > access > > > and work w/ the data w/o a need to dump it back to slow storage? > > > > > > Thoughts? > > > Cos > > > > > >