Re: Exchange stucks while node restoring state from WAL

Pavel Kovalenko Fri, 03 Aug 2018 07:46:22 -0700

Hello Maxim,

1) Yes, Discovery Manager is starting after GridCacheProcessor, which
starts GridCacheDatabaseSharedManager which invokes readMetastorage on
start.


2) Before we complete the local join future, we create and add Exchange
future on local node join to ExchangeManager. So, when local join future
completes, first PME on that node should be already run or at least there
will be first Exchange future in Exchange worker queue.

3) I don't exactly understand what do you mean saying "update obsolete
partition counter", but the process is the following:
    a) We restore page memory state (which contain partition update
counters) from the last checkpoint.
    b) We repair page memory state using physical delta records from WAL.
    c) We apply logical records since last checkpoint finish marker. During
this iteration, if we meet DataRecord we increment update counter on
appropriate partition. NOTE: This phase currently happens during exchange.
    After that partition exchange starts and information about actual
update counters will be collected from FullMessage. If partition counters
are outdated, rebalance will start after PME end.

4) I think yes because everything that you need is cache descriptors of
currently started caches on the grid. Some information can be retrieved
from the static configuration. Information about dynamically started caches
can be retrieved from Discovery Data Bag received when a node joins to
ring.
See 
org.apache.ignite.internal.processors.cache.ClusterCachesInfo#onGridDataReceived

5) I think the main problem is that metastorage and page memory share one
WAL. But If this phase will happen before PME is started on all cluster
nodes, I think it's acceptable that they will use 2 WAL iterations.

2018-08-03 10:28 GMT+03:00 Nikolay Izhikov <nizhi...@apache.org>:

> Hello, Maxim.
>
> > 1) Is it correct that readMetastore() happens after node starts> but
> before including node into the ring?>
>
> I think yes.
> You can have some kind of metainformation required on node join.
>
> > 5) Does in our final solution for new joined node readMetastore> and
> restoreMemory should be performed in one step?
>
> I think, no.
>
> Meta Information can be required to perform restore memory.
> So we have to restore metainformation in first step and restore whole
> memory as a second step.
>
> В Пт, 03/08/2018 в 09:44 +0300, Maxim Muzafarov пишет:
> > Hi Igniters,
> >
> >
> > I'm working on bug [1] and have some questions about the final
> > implementation. Probably, I've already found answers on some of
> > them but I want to be sure. Please, help me to clarify details.
> >
> >
> > The key problem here is that we are reading WAL and restoring
> > memory state of new joined node inside PME. Reading WAL can
> > consume huge amount of time, so the whole cluster stucks and
> > waits for the single node.
> >
> >
> > 1) Is it correct that readMetastore() happens after node starts
> > but before including node into the ring?
> >
> > 2) Is after onDone() method called for LocalJoinFuture on local
> > node happend we can proceed with initiating PME on local node?
> >
> > 3) After reading checkpoint and restore memory for new joined
> > node how and when we are updating obsolete partitions update
> > counter? At historical rebalance, right?
> >
> > 4) Should we restoreMemory for new joined node before PME
> > initiates on the other nodes in cluster?
> >
> > 5) Does in our final solution for new joined node readMetastore
> > and restoreMemory should be performed in one step?
> >
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-7196
>

Re: Exchange stucks while node restoring state from WAL

Reply via email to