Hi Igniters,
I'm working on bug [1] and have some questions about the final implementation. Probably, I've already found answers on some of them but I want to be sure. Please, help me to clarify details. The key problem here is that we are reading WAL and restoring memory state of new joined node inside PME. Reading WAL can consume huge amount of time, so the whole cluster stucks and waits for the single node. 1) Is it correct that readMetastore() happens after node starts but before including node into the ring? 2) Is after onDone() method called for LocalJoinFuture on local node happend we can proceed with initiating PME on local node? 3) After reading checkpoint and restore memory for new joined node how and when we are updating obsolete partitions update counter? At historical rebalance, right? 4) Should we restoreMemory for new joined node before PME initiates on the other nodes in cluster? 5) Does in our final solution for new joined node readMetastore and restoreMemory should be performed in one step? [1] https://issues.apache.org/jira/browse/IGNITE-7196 -- -- Maxim Muzafarov