That's a good question... We are still in the design phase for this feature.
Initially I would have said that replicated in-memory is what we want. However, Flink is aiming to support long running stream analytics (weeks, months, ...) and it would be bad if state collected over such a long time would be lost. So some kind of disk persistence would be good for certain use cases. 2015-04-29 1:28 GMT+02:00 Dmitriy Setrakyan <dsetrak...@apache.org>: > On Tue, Apr 28, 2015 at 5:55 PM, Fabian Hueske <fhue...@gmail.com> wrote: > > > Thanks Cos for starting this discussion, hi to the Ignite community! > > > > The probably easiest and most straightforward integration of Flink and > > Ignite would be to go through Ignite's IGFS. Flink can be easily extended > > to support additional filesystems. > > > > However, the Flink community is currently also looking for a solution to > > checkpoint operator state of running stream processing programs. Flink > > processes data streams in real time similar to Storm, i.e., it schedules > > all operators of a streaming program and data is continuously flowing > from > > operator to operator. Instead of acknowledging each individual record, > > Flink injects stream offset markers into the stream in regular intervals. > > Whenever, an operator receives such a marker it checkpoints its current > > state (currently to the master with some limitations). In case of a > > failure, the stream is replayed (using a replayable source such as Kafka) > > from the last checkpoint that was not received by all sink operators and > > all operator states are reset to that checkpoint. > > We had already looked at Ignite and were wondering whether Ignite could > be > > used to reliably persist the state of streaming operator. > > > > Fabian, do you need these checkpoints stored in memory (with optional > redundant copies, or course) or on disk? I think in-memory makes a lot more > sense from performance standpoint, and can easily be done in Ignite. > > > > > > The other points I mentioned on Twitter are just rough ideas at the > moment. > > > > Cheers, Fabian > > > > 2015-04-29 0:23 GMT+02:00 Dmitriy Setrakyan <dsetrak...@apache.org>: > > > > > Thanks Cos. > > > > > > Hello Flink Community. > > > > > > From Ignite standpoint we definitely would be interested in providing > > Flink > > > processing API on top of Ignite Data Grid or IGFS. It would be > > interesting > > > to hear what steps would be required for such integration or if there > are > > > other integration points. > > > > > > D. > > > > > > On Tue, Apr 28, 2015 at 2:57 PM, Konstantin Boudnik <c...@apache.org> > > > wrote: > > > > > > > Following the lively exchange in Twitter (sic!) I would like to bring > > > > together > > > > Ignite and Flink communities to discuss the benefits of the > integration > > > and > > > > see where we can start it. > > > > > > > > We have this recently opened ticket > > > > https://issues.apache.org/jira/browse/IGNITE-813 > > > > > > > > and Fabian has listed the following points: > > > > > > > > 1) data store > > > > 2) parameter server for ML models > > > > 3) Checkpointing streaming op state > > > > 4) continuously updating views from streams > > > > > > > > I'd add > > > > 5) using Ignite IGFS to speed up Flink's access to HDFS data. > > > > > > > > I see a lot of interesting correlations between two projects and > wonder > > > if > > > > Flink guys can step up with a few thoughts on where Flink can benefit > > the > > > > most > > > > from Ignite's in-memory fabric architecture? Perhaps, it can be used > as > > > > in-memory storage where the other components of the stack can quickly > > > > access > > > > and work w/ the data w/o a need to dump it back to slow storage? > > > > > > > > Thoughts? > > > > Cos > > > > > > > > > >