Re: Historical rebalance

Seliverstov Igor Thu, 29 Nov 2018 03:47:48 -0800

Ivan,

different transactions may be applied in different order on backup nodes.
That's why we need an active tx set
and some sorting by their update times. The idea is to identify a point in
time which starting from we may lost some updates.
This point:
   1) is the last acknowledged by all backups (including possible further
demander) update on timeline;
   2) have a specific update counter (aka back-counter) which we going to
start iteration from.


After additional thinking on, I've identified a rule:

There is two fences:
  1) update counter (UC) - this means that all updates, with less UC than
applied one, was applied on a node, having this UC.
  2) update in scope of TX - all updates are applied one by one
sequentially, this means that the fact of update guaranties the previous
update (statement) was finished on all TX participants.

Сombining them, we can say the next:

All updates, that was acknowledged at the time the last update of tx, which
updated UC, applied, are guaranteed to be presented on a node having such UC

We can use this rule to find an iterator start pointer.

ср, 28 нояб. 2018 г. в 20:26, Павлухин Иван <[email protected]>:

> Guys,
>
> Another one idea. We can introduce additional update counter which is
> incremented by MVCC transactions right after executing operation (like
> is done for classic transactions). And we can use that counter for
> searching needed WAL records. Can it did the trick?
>
> P.S. Mentally I am trying to separate facilities providing
> transactions and durability. And it seems to me that those facilities
> are in different dimensions.
> ср, 28 нояб. 2018 г. в 16:26, Павлухин Иван <[email protected]>:
> >
> > Sorry, if it was stated that a SINGLE transaction updates are applied
> > in a same order on all replicas then I have no questions so far. I
> > thought about reordering updates coming from different transactions.
> > > I have not got why we can assume that reordering is not possible. What
> > have I missed?
> > ср, 28 нояб. 2018 г. в 13:26, Павлухин Иван <[email protected]>:
> > >
> > > Hi,
> > >
> > > Regarding Vladimir's new idea.
> > > > We assume that transaction can be represented as a set of
> independent operations, which are applied in the same order on both primary
> and backup nodes.
> > > I have not got why we can assume that reordering is not possible. What
> > > have I missed?
> > > вт, 27 нояб. 2018 г. в 14:42, Seliverstov Igor <[email protected]>:
> > > >
> > > > Vladimir,
> > > >
> > > > I think I got your point,
> > > >
> > > > It should work if we do the next:
> > > > introduce two structures: active list (txs) and candidate list
> (updCntr ->
> > > > txn pairs)
> > > >
> > > > Track active txs, mapping them to actual update counter at update
> time.
> > > > On each next update put update counter, associated with previous
> update,
> > > > into a candidates list possibly overwrite existing value (checking
> txn)
> > > > On tx finish remove tx from active list only if appropriate update
> counter
> > > > (associated with finished tx) is applied.
> > > > On update counter update set the minimal update counter from the
> candidates
> > > > list as a back-counter, clear the candidate list and remove an
> associated
> > > > tx from the active list if present.
> > > > Use back-counter instead of actual update counter in demand message.
> > > >
> > > > вт, 27 нояб. 2018 г. в 12:56, Seliverstov Igor <[email protected]
> >:
> > > >
> > > > > Ivan,
> > > > >
> > > > > 1) The list is saved on each checkpoint, wholly (all transactions
> in
> > > > > active state at checkpoint begin).
> > > > > We need whole the list to get oldest transaction because after
> > > > > the previous oldest tx finishes, we need to get the following one.
> > > > >
> > > > > 2) I guess there is a description of how persistent storage works
> and how
> > > > > it restores [1]
> > > > >
> > > > > Vladimir,
> > > > >
> > > > > the whole list of what we going to store on checkpoint (updated):
> > > > > 1) Partition counter low watermark (LWM)
> > > > > 2) WAL pointer of earliest active transaction write to partition
> at the
> > > > > time the checkpoint have started
> > > > > 3) List of prepared txs with acquired partition counters (which
> were
> > > > > acquired but not applied yet)
> > > > >
> > > > > This way we don't need any additional info in demand message.
> Start point
> > > > > can be easily determined using stored WAL "back-pointer".
> > > > >
> > > > > [1]
> > > > >
> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood#IgnitePersistentStore-underthehood-LocalRecoveryProcess
> > > > >
> > > > >
> > > > > вт, 27 нояб. 2018 г. в 11:19, Vladimir Ozerov <
> [email protected]>:
> > > > >
> > > > >> Igor,
> > > > >>
> > > > >> Could you please elaborate - what is the whole set of information
> we are
> > > > >> going to save at checkpoint time? From what I understand this
> should be:
> > > > >> 1) List of active transactions with WAL pointers of their first
> writes
> > > > >> 2) List of prepared transactions with their update counters
> > > > >> 3) Partition counter low watermark (LWM) - the smallest partition
> counter
> > > > >> before which there are no prepared transactions.
> > > > >>
> > > > >> And the we send to supplier node a message: "Give me all updates
> starting
> > > > >> from that LWM plus data for that transactions which were active
> when I
> > > > >> failed".
> > > > >>
> > > > >> Am I right?
> > > > >>
> > > > >> On Fri, Nov 23, 2018 at 11:22 AM Seliverstov Igor <
> [email protected]>
> > > > >> wrote:
> > > > >>
> > > > >> > Hi Igniters,
> > > > >> >
> > > > >> > Currently I’m working on possible approaches how to implement
> historical
> > > > >> > rebalance (delta rebalance using WAL iterator) over MVCC caches.
> > > > >> >
> > > > >> > The main difficulty is that MVCC writes changes on tx active
> phase while
> > > > >> > partition update version, aka update counter, is being applied
> on tx
> > > > >> > finish. This means we cannot start iteration over WAL right
> from the
> > > > >> > pointer where the update counter updated, but should include
> updates,
> > > > >> which
> > > > >> > the transaction that updated the counter did.
> > > > >> >
> > > > >> > These updates may be much earlier than the point where the
> update
> > > > >> counter
> > > > >> > was updated, so we have to be able to identify the point where
> the first
> > > > >> > update happened.
> > > > >> >
> > > > >> > The proposed approach includes:
> > > > >> >
> > > > >> > 1) preserve list of active txs, sorted by the time of their
> first update
> > > > >> > (using WAL ptr of first WAL record in tx)
> > > > >> >
> > > > >> > 2) persist this list on each checkpoint (together with TxLog for
> > > > >> example)
> > > > >> >
> > > > >> > 4) send whole active tx list (transactions which were in active
> state at
> > > > >> > the time the node was crushed, empty list in case of graceful
> node
> > > > >> stop) as
> > > > >> > a part of partition demand message.
> > > > >> >
> > > > >> > 4) find a checkpoint where the earliest tx exists in persisted
> txs and
> > > > >> use
> > > > >> > saved WAL ptr as a start point or apply current approach in
> case the
> > > > >> active
> > > > >> > tx list (sent on previous step) is empty
> > > > >> >
> > > > >> > 5) start iteration.
> > > > >> >
> > > > >> > Your thoughts?
> > > > >> >
> > > > >> > Regards,
> > > > >> > Igor
> > > > >>
> > > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

Re: Historical rebalance

Reply via email to