Re: kick off a discussion

Alexandre Boudnik Tue, 12 Jul 2016 11:45:39 -0700

Dmitriy, thank you for your time and questions, which helped me to
realize what I forget to mentioned!
See my answers inline; later I'll combine everything together to help
to the next readers :)


I put together some implementation ideas in Apache Ignite JIRA, as
promised: https://issues.apache.org/jira/browse/IGNITE-3457. I see
this facility as another CacheStore implementation, so it wouldn't
interfere with base principals of Ignite platform.


On Mon, Jul 11, 2016 at 1:15 AM, Dmitriy Setrakyan
<dsetrak...@apache.org> wrote:
> My answers are inline…
>
> On Sat, Jul 9, 2016 at 3:04 AM, Dmitriy Setrakyan <dsetrak...@apache.org>
> wrote:
>
>> Thanks Sasha!
>>
>> Resending to the dev list.
>>
>> D.
>>
>> On Fri, Jul 8, 2016 at 2:02 PM, Alexandre Boudnik <alexan...@boudnik.org>
>> wrote:
>>
>>> Apache Ignite a great platform but it lacks of certain capabilities,
>>> which are common in RDMS world, such as:
>>> - Consistent on-line backup for data on entire cluster (or for
>>> specified set of caches)
>>>
>>
> I think you mean data center replication here. It is not an easy feature to
> implement, and so far has been handled by commercial vendors of Ignite,
> e.g. GridGain.
>
Actually not. Right here I meant exactly what I said: full or
incremental backup of all/selected caches in consistent state so it
can be used for the purpose of being able to restore them in case of
data loss or data corruption. One of important use cases is the OLAP
systems (let's say for banking), which has been built on Apache Ignite
platform.

And you right, data center replication can be easily implemented based
on log/snapshot shipment.

>
>> - Hierarchal snapshots for specified set caches
>>>
>>
> What do you mean by hierarchical?
>
In this particular case the notion of hierarchical snapshots is very
similar to the same notion used in SAN appliances or by Virtual Box or
vmware. Using concept of snapshots we can do all this amazing things:
- full and incremental backup
- restore
- rollback to checkpoint
- roll forward
much easier, with minimal memory and I/O overhead.

>
>> - Transaction log
>>>
>>
> Why does Ignite need it for in-memory transactions?
>
At least it is required to provide roll-forward functionality, when
you restores the state of the cache from checkpoint (the cache state
before snapshot has been made) and then reapply transactions one by
one.

>
>> - Restore cluster state as of certain point in time
>>>
>>
> Given that such restorability may introduce lots of memory overhead, does
> it really make sense  for an in-memory cache?
>
Actually, it will not consume any memory. It will use external memory,
such as HDD/SSD space instead. And yes, I think that this
functionality makes complete sense for our users IRL, who will love
it.

>
>> - Rolling forward from snapshot with ability to filter/modify transactions
>>>
>>
> Same as above
>
The same as above: my customers in trenches are begging for that feature.

>
>> - Asynchronous replication based either on log shipment or snapshot
>>> shipment
>>> -- Between clusters
>>>
>>
> This is the same as data center replication, no?
Including but not limited to: log shipment or snapshot shipment also
could be used to implement so called "better-than-lambda-architecture"
for BI and OLAP, when data replicated to a query-able datasource let's
say Oracle as soon as they are produced by OLTP system. We can use
RDBMS API such as Oracle Streams (going to be discontinued - sad) or
Golden Gate to filter changes from logs/snapshots and then apply them.
That approach allows to save a tons of legacy reports and BI
dashboards.

>
>
>> -- Continues data export to let’s say RDMS
>>>
>>
> Don’t we already support it with our write-through feature to a database?
>
When write-through used for non-local caches it may cause the data
corruption in RDBMS: I have opened this issue a few weeks ago:
https://issues.apache.org/jira/browse/IGNITE-3321

>
>> It is also a necessity to reduce cold start time for huge clusters
>>> with strict SLAs.
>>>
>>
> What part are you trying to speed up here? Are you talking about loading
> data from databases?
>
I'm talking about the initial load from Persistent Store when cluster
has been cold-started (like from GridGain's Local Recoverable Store).

>
>>
>>> I'll put some implementation ideas in JIRA later on. I believe that
>>> this list is far from being complete, but I want the community to
>>> discuss these abovementioned use cases.
>>>
>>> --Sasha
>>>
>>
>>

Re: kick off a discussion

Reply via email to