I like it quite a bit, as well! Ticket would make the most sense as well, so there will be a single place to collect the design docs (if needed), etc.
On Wed, Oct 21, 2015 at 04:45PM, Dmitriy Setrakyan wrote: > I also really like the idea. One potential use case is fraud analysis in > financial institutions. Rarely it makes sense to perform such analysis on a > life system, but rather a snapshot of some data needs to be taken and > analyzed offline. > > I think snapshots should be saved to disk, so users could load them for > analysis on a totally different cluster. I think disk persistence should be optional, not mandatory. Cos > Raul, if you don’t mind, can you file a ticket and see if anyone in the > community wants to pick it up? > > D. > > On Wed, Oct 21, 2015 at 5:51 AM, Sergi Vladykin <sergi.vlady...@gmail.com> > wrote: > > > Raul, > > > > Actually SQL indexes are already snapshotable. I'm not sure if it does make > > sense to make > > the whole cache (with full cache API support) snapshotable, but I like your > > idea > > about running multiple SQL statements against the same snapshot. > > > > Also I don't think that it is a good idea to keep snapshots for a long > > time, > > so I'd prefer to have typical AutoClosable API like: > > > > try (Snapshot s = ...) { > > s.query(...); > > s.query(...); > > s.query(...); > > } > > > > Though I'm not sure when we will be able to get down to this. > > > > Sergi > > > > 2015-10-21 12:06 GMT+03:00 Raul Kripalani <ra...@apache.org>: > > > > > Hey guys, > > > > > > LevelDb has a functionality called Snapshots which provides a consistent > > > read-only view of the DB at a given point in time, against which queries > > > can be executed. > > > > > > To my knowledge, this functionality doesn't exist in the world of open > > > source In-Memory Computing. Ignite could be an innovator here. > > > > > > Ignite Snapshots would allow queries, distributed closures, map-reduce > > > jobs, etc. It could be useful for Spark RDDs to avoid data shift while > > the > > > computation is taking place (not sure if there's already some form of > > > snapshotting, though). Same for IGFS. > > > > > > Example usage: > > > > > > IgniteCacheSnapshot snapshot = > > > ignite.cache("mycache").snapshots().create(); > > > > > > // all three queries are executed against a view of the cache at the > > > point in time where it was snapshotted > > > snapshot.query("select ..."); > > > snapshot.query("select ..."); > > > snapshot.query("select ..."); > > > > > > In fact, it would be awesome to be able to logically save this snapshot > > > with a name so that later jobs, queries, etc. can run on top of it, e.g.: > > > > > > IgniteCacheSnapshot snapshot = > > > ignite.cache("mycache").snapshots().create("abc"); > > > > > > // ... > > > // in another module of a distributed system, or in another thread in > > > parallel, use the saved snapshot > > > IgniteCacheSnapshot snapshot = > > > ignite.cache("mycache").snapshots().get("abc"); > > > .... > > > > > > Named snapshotting can be dangerous due to data retention, e.g. imagine > > > keeping a snapshot for 2 weeks! So we should force the user to specify a > > > TTL: > > > > > > IgniteCacheSnapshot snapshot = > > > ignite.cache("mycache").snapshots().create("abc", 2, TimeUnit.HOURS); > > > > > > Such functionality would allow for "reporting checkpoints" and "time > > > travel", for example, where you want users to be able to query the data > > as > > > it stood 1 hour ago, 2 hours ago, etc. > > > > > > What do you think? > > > > > > P.S.: We do have some form of snapshotting in the Compute checkpointing > > > functionality – but my proposal is to generalise the notion. > > > > > > Regards, > > > > > > *Raúl Kripalani* > > > PMC & Committer @ Apache Ignite, Apache Camel | Integration, Big Data and > > > Messaging Engineer > > > http://about.me/raulkripalani | http://www.linkedin.com/in/raulkripalani > > > http://blog.raulkr.net | twitter: @raulvk > > > > >
signature.asc
Description: Digital signature