Re: Historical rebalance

Павлухин Иван Wed, 28 Nov 2018 05:26:40 -0800

Sorry, if it was stated that a SINGLE transaction updates are applied
in a same order on all replicas then I have no questions so far. I
thought about reordering updates coming from different transactions.
> I have not got why we can assume that reordering is not possible. What
have I missed?
ср, 28 нояб. 2018 г. в 13:26, Павлухин Иван <vololo...@gmail.com>:
>
> Hi,
>
> Regarding Vladimir's new idea.
> > We assume that transaction can be represented as a set of independent 
> > operations, which are applied in the same order on both primary and backup 
> > nodes.
> I have not got why we can assume that reordering is not possible. What
> have I missed?
> вт, 27 нояб. 2018 г. в 14:42, Seliverstov Igor <gvvinbl...@gmail.com>:
> >
> > Vladimir,
> >
> > I think I got your point,
> >
> > It should work if we do the next:
> > introduce two structures: active list (txs) and candidate list (updCntr ->
> > txn pairs)
> >
> > Track active txs, mapping them to actual update counter at update time.
> > On each next update put update counter, associated with previous update,
> > into a candidates list possibly overwrite existing value (checking txn)
> > On tx finish remove tx from active list only if appropriate update counter
> > (associated with finished tx) is applied.
> > On update counter update set the minimal update counter from the candidates
> > list as a back-counter, clear the candidate list and remove an associated
> > tx from the active list if present.
> > Use back-counter instead of actual update counter in demand message.
> >
> > вт, 27 нояб. 2018 г. в 12:56, Seliverstov Igor <gvvinbl...@gmail.com>:
> >
> > > Ivan,
> > >
> > > 1) The list is saved on each checkpoint, wholly (all transactions in
> > > active state at checkpoint begin).
> > > We need whole the list to get oldest transaction because after
> > > the previous oldest tx finishes, we need to get the following one.
> > >
> > > 2) I guess there is a description of how persistent storage works and how
> > > it restores [1]
> > >
> > > Vladimir,
> > >
> > > the whole list of what we going to store on checkpoint (updated):
> > > 1) Partition counter low watermark (LWM)
> > > 2) WAL pointer of earliest active transaction write to partition at the
> > > time the checkpoint have started
> > > 3) List of prepared txs with acquired partition counters (which were
> > > acquired but not applied yet)
> > >
> > > This way we don't need any additional info in demand message. Start point
> > > can be easily determined using stored WAL "back-pointer".
> > >
> > > [1]
> > > https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood#IgnitePersistentStore-underthehood-LocalRecoveryProcess
> > >
> > >
> > > вт, 27 нояб. 2018 г. в 11:19, Vladimir Ozerov <voze...@gridgain.com>:
> > >
> > >> Igor,
> > >>
> > >> Could you please elaborate - what is the whole set of information we are
> > >> going to save at checkpoint time? From what I understand this should be:
> > >> 1) List of active transactions with WAL pointers of their first writes
> > >> 2) List of prepared transactions with their update counters
> > >> 3) Partition counter low watermark (LWM) - the smallest partition counter
> > >> before which there are no prepared transactions.
> > >>
> > >> And the we send to supplier node a message: "Give me all updates starting
> > >> from that LWM plus data for that transactions which were active when I
> > >> failed".
> > >>
> > >> Am I right?
> > >>
> > >> On Fri, Nov 23, 2018 at 11:22 AM Seliverstov Igor <gvvinbl...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Igniters,
> > >> >
> > >> > Currently I’m working on possible approaches how to implement 
> > >> > historical
> > >> > rebalance (delta rebalance using WAL iterator) over MVCC caches.
> > >> >
> > >> > The main difficulty is that MVCC writes changes on tx active phase 
> > >> > while
> > >> > partition update version, aka update counter, is being applied on tx
> > >> > finish. This means we cannot start iteration over WAL right from the
> > >> > pointer where the update counter updated, but should include updates,
> > >> which
> > >> > the transaction that updated the counter did.
> > >> >
> > >> > These updates may be much earlier than the point where the update
> > >> counter
> > >> > was updated, so we have to be able to identify the point where the 
> > >> > first
> > >> > update happened.
> > >> >
> > >> > The proposed approach includes:
> > >> >
> > >> > 1) preserve list of active txs, sorted by the time of their first 
> > >> > update
> > >> > (using WAL ptr of first WAL record in tx)
> > >> >
> > >> > 2) persist this list on each checkpoint (together with TxLog for
> > >> example)
> > >> >
> > >> > 4) send whole active tx list (transactions which were in active state 
> > >> > at
> > >> > the time the node was crushed, empty list in case of graceful node
> > >> stop) as
> > >> > a part of partition demand message.
> > >> >
> > >> > 4) find a checkpoint where the earliest tx exists in persisted txs and
> > >> use
> > >> > saved WAL ptr as a start point or apply current approach in case the
> > >> active
> > >> > tx list (sent on previous step) is empty
> > >> >
> > >> > 5) start iteration.
> > >> >
> > >> > Your thoughts?
> > >> >
> > >> > Regards,
> > >> > Igor
> > >>
> > >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin




-- 
Best regards,
Ivan Pavlukhin

Re: Historical rebalance

Reply via email to