Re: Data Snapshots in Ignite

M G Thu, 22 Oct 2015 07:00:26 -0700

I have a specific use-case for snapshots that my current client would want
to make use of, it may be helpful if I share it with you.


At the start of day we make a batch load from a reference data system, and
run a set of Start-Of-Day (SOD) reports.  Those reports must be on a
consistent view of the data - no updates from external sources are
permitted whilst the reports are running.  However, once the SOD reports
have run, we then want to receive updates from our databases and apply
those to the cache.  Here is the use-case: we want to have the original SOD
data available, snapshotted, so that we can re-run reports if they fail, or
compare what was used in the SOD reports with what those reports now
produce.

At present we are going to build an extra layer around my Ignite-based
library that provides this snapshot functionality.

On Thu, Oct 22, 2015 at 1:38 PM, Raul Kripalani <ra...@apache.org> wrote:

> Hey Andre,
>
> I think I answered some of your questions in my response to Dmitriy [1].
> Could you please have a look and tell me if it answers your questions?
>
> N.B.: My idea is based around the typical use case for LevelDb Snapshots,
> but we might create something entirely different in Ignite if the community
> wants to.
>
> [1]
>
> http://apache-ignite-developers.2346864.n4.nabble.com/Data-Snapshots-in-Ignite-tp4183p4220.html
>
> *Raúl Kripalani*
> PMC & Committer @ Apache Ignite, Apache Camel | Integration, Big Data and
> Messaging Engineer
> http://about.me/raulkripalani | http://www.linkedin.com/in/raulkripalani
> http://blog.raulkr.net | twitter: @raulvk
>
> On Thu, Oct 22, 2015 at 12:49 PM, Andrey Kornev <andrewkor...@hotmail.com>
> wrote:
>
> > Hello,
> >
> > Just a few questions.
> >
> > 1) It's not clear from the proposed API how to capture/retrieve a
> > consistent snapshot of multiple caches. If my query involves a join I'd
> > like to ensure consistency across all join participants.
> > 2) Implementation wise, is the snapshot just a physical copy of all cache
> > entries and their indexes? Or some other mechanism is being considered?
> > 3) Isolation: is the snapshot isolated with respect to concurrent
> > modifications?
> > 4) Serialization: what are my options to ensure that I can still read the
> > data from the old snapshots as my key/value class definitions change over
> > time?
> >
> >  I feel I do not quite understand the specific use case this feature is
> > expected to be applicable to. Why keeping a snapshot for 2 weeks is
> > unimaginable, but 1 or 2 hours is ok?
> >
> > Also, I think forcing people to set a TTL on a snapshot is pointless and
> > will be abused by setting it to an unreasonably large value, just in
> case.
> >
> > Thanks
> > Andrey
> >
> > > From: ra...@apache.org
> > > Date: Wed, 21 Oct 2015 10:06:25 +0100
> > > Subject: Data Snapshots in Ignite
> > > To: dev@ignite.apache.org
> > >
> > > Hey guys,
> > >
> > > LevelDb has a functionality called Snapshots which provides a
> consistent
> > > read-only view of the DB at a given point in time, against which
> queries
> > > can be executed.
> > >
> > > To my knowledge, this functionality doesn't exist in the world of open
> > > source In-Memory Computing. Ignite could be an innovator here.
> > >
> > > Ignite Snapshots would allow queries, distributed closures, map-reduce
> > > jobs, etc. It could be useful for Spark RDDs to avoid data shift while
> > the
> > > computation is taking place (not sure if there's already some form of
> > > snapshotting, though). Same for IGFS.
> > >
> > > Example usage:
> > >
> > >     IgniteCacheSnapshot snapshot =
> > > ignite.cache("mycache").snapshots().create();
> > >
> > >     // all three queries are executed against a view of the cache at
> the
> > > point in time where it was snapshotted
> > >     snapshot.query("select ...");
> > >     snapshot.query("select ...");
> > >     snapshot.query("select ...");
> > >
> > > In fact, it would be awesome to be able to logically save this snapshot
> > > with a name so that later jobs, queries, etc. can run on top of it,
> e.g.:
> > >
> > >     IgniteCacheSnapshot snapshot =
> > > ignite.cache("mycache").snapshots().create("abc");
> > >
> > >     // ...
> > >     // in another module of a distributed system, or in another thread
> in
> > > parallel, use the saved snapshot
> > >     IgniteCacheSnapshot snapshot =
> > > ignite.cache("mycache").snapshots().get("abc");
> > >     ....
> > >
> > > Named snapshotting can be dangerous due to data retention, e.g. imagine
> > > keeping a snapshot for 2 weeks! So we should force the user to specify
> a
> > > TTL:
> > >
> > >     IgniteCacheSnapshot snapshot =
> > > ignite.cache("mycache").snapshots().create("abc", 2, TimeUnit.HOURS);
> > >
> > > Such functionality would allow for "reporting checkpoints" and "time
> > > travel", for example, where you want users to be able to query the data
> > as
> > > it stood 1 hour ago, 2 hours ago, etc.
> > >
> > > What do you think?
> > >
> > > P.S.: We do have some form of snapshotting in the Compute checkpointing
> > > functionality – but my proposal is to generalise the notion.
> > >
> > > Regards,
> > >
> > > *Raúl Kripalani*
> > > PMC & Committer @ Apache Ignite, Apache Camel | Integration, Big Data
> and
> > > Messaging Engineer
> > > http://about.me/raulkripalani |
> http://www.linkedin.com/in/raulkripalani
> > > http://blog.raulkr.net | twitter: @raulvk
> >
> >
>

Re: Data Snapshots in Ignite

Reply via email to