Dmitriy, thank you for your time and questions, which helped me to realize what I forget to mentioned! See my answers inline; later I'll combine everything together to help to the next readers :)
I put together some implementation ideas in Apache Ignite JIRA, as promised: https://issues.apache.org/jira/browse/IGNITE-3457. I see this facility as another CacheStore implementation, so it wouldn't interfere with base principals of Ignite platform. On Mon, Jul 11, 2016 at 1:15 AM, Dmitriy Setrakyan <dsetrak...@apache.org> wrote: > My answers are inline… > > On Sat, Jul 9, 2016 at 3:04 AM, Dmitriy Setrakyan <dsetrak...@apache.org> > wrote: > >> Thanks Sasha! >> >> Resending to the dev list. >> >> D. >> >> On Fri, Jul 8, 2016 at 2:02 PM, Alexandre Boudnik <alexan...@boudnik.org> >> wrote: >> >>> Apache Ignite a great platform but it lacks of certain capabilities, >>> which are common in RDMS world, such as: >>> - Consistent on-line backup for data on entire cluster (or for >>> specified set of caches) >>> >> > I think you mean data center replication here. It is not an easy feature to > implement, and so far has been handled by commercial vendors of Ignite, > e.g. GridGain. > Actually not. Right here I meant exactly what I said: full or incremental backup of all/selected caches in consistent state so it can be used for the purpose of being able to restore them in case of data loss or data corruption. One of important use cases is the OLAP systems (let's say for banking), which has been built on Apache Ignite platform. And you right, data center replication can be easily implemented based on log/snapshot shipment. > >> - Hierarchal snapshots for specified set caches >>> >> > What do you mean by hierarchical? > In this particular case the notion of hierarchical snapshots is very similar to the same notion used in SAN appliances or by Virtual Box or vmware. Using concept of snapshots we can do all this amazing things: - full and incremental backup - restore - rollback to checkpoint - roll forward much easier, with minimal memory and I/O overhead. > >> - Transaction log >>> >> > Why does Ignite need it for in-memory transactions? > At least it is required to provide roll-forward functionality, when you restores the state of the cache from checkpoint (the cache state before snapshot has been made) and then reapply transactions one by one. > >> - Restore cluster state as of certain point in time >>> >> > Given that such restorability may introduce lots of memory overhead, does > it really make sense for an in-memory cache? > Actually, it will not consume any memory. It will use external memory, such as HDD/SSD space instead. And yes, I think that this functionality makes complete sense for our users IRL, who will love it. > >> - Rolling forward from snapshot with ability to filter/modify transactions >>> >> > Same as above > The same as above: my customers in trenches are begging for that feature. > >> - Asynchronous replication based either on log shipment or snapshot >>> shipment >>> -- Between clusters >>> >> > This is the same as data center replication, no? Including but not limited to: log shipment or snapshot shipment also could be used to implement so called "better-than-lambda-architecture" for BI and OLAP, when data replicated to a query-able datasource let's say Oracle as soon as they are produced by OLTP system. We can use RDBMS API such as Oracle Streams (going to be discontinued - sad) or Golden Gate to filter changes from logs/snapshots and then apply them. That approach allows to save a tons of legacy reports and BI dashboards. > > >> -- Continues data export to let’s say RDMS >>> >> > Don’t we already support it with our write-through feature to a database? > When write-through used for non-local caches it may cause the data corruption in RDBMS: I have opened this issue a few weeks ago: https://issues.apache.org/jira/browse/IGNITE-3321 > >> It is also a necessity to reduce cold start time for huge clusters >>> with strict SLAs. >>> >> > What part are you trying to speed up here? Are you talking about loading > data from databases? > I'm talking about the initial load from Persistent Store when cluster has been cold-started (like from GridGain's Local Recoverable Store). > >> >>> I'll put some implementation ideas in JIRA later on. I believe that >>> this list is far from being complete, but I want the community to >>> discuss these abovementioned use cases. >>> >>> --Sasha >>> >> >>