Sorry, if it was stated that a SINGLE transaction updates are applied in a same order on all replicas then I have no questions so far. I thought about reordering updates coming from different transactions. > I have not got why we can assume that reordering is not possible. What have I missed? ср, 28 нояб. 2018 г. в 13:26, Павлухин Иван <vololo...@gmail.com>: > > Hi, > > Regarding Vladimir's new idea. > > We assume that transaction can be represented as a set of independent > > operations, which are applied in the same order on both primary and backup > > nodes. > I have not got why we can assume that reordering is not possible. What > have I missed? > вт, 27 нояб. 2018 г. в 14:42, Seliverstov Igor <gvvinbl...@gmail.com>: > > > > Vladimir, > > > > I think I got your point, > > > > It should work if we do the next: > > introduce two structures: active list (txs) and candidate list (updCntr -> > > txn pairs) > > > > Track active txs, mapping them to actual update counter at update time. > > On each next update put update counter, associated with previous update, > > into a candidates list possibly overwrite existing value (checking txn) > > On tx finish remove tx from active list only if appropriate update counter > > (associated with finished tx) is applied. > > On update counter update set the minimal update counter from the candidates > > list as a back-counter, clear the candidate list and remove an associated > > tx from the active list if present. > > Use back-counter instead of actual update counter in demand message. > > > > вт, 27 нояб. 2018 г. в 12:56, Seliverstov Igor <gvvinbl...@gmail.com>: > > > > > Ivan, > > > > > > 1) The list is saved on each checkpoint, wholly (all transactions in > > > active state at checkpoint begin). > > > We need whole the list to get oldest transaction because after > > > the previous oldest tx finishes, we need to get the following one. > > > > > > 2) I guess there is a description of how persistent storage works and how > > > it restores [1] > > > > > > Vladimir, > > > > > > the whole list of what we going to store on checkpoint (updated): > > > 1) Partition counter low watermark (LWM) > > > 2) WAL pointer of earliest active transaction write to partition at the > > > time the checkpoint have started > > > 3) List of prepared txs with acquired partition counters (which were > > > acquired but not applied yet) > > > > > > This way we don't need any additional info in demand message. Start point > > > can be easily determined using stored WAL "back-pointer". > > > > > > [1] > > > https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood#IgnitePersistentStore-underthehood-LocalRecoveryProcess > > > > > > > > > вт, 27 нояб. 2018 г. в 11:19, Vladimir Ozerov <voze...@gridgain.com>: > > > > > >> Igor, > > >> > > >> Could you please elaborate - what is the whole set of information we are > > >> going to save at checkpoint time? From what I understand this should be: > > >> 1) List of active transactions with WAL pointers of their first writes > > >> 2) List of prepared transactions with their update counters > > >> 3) Partition counter low watermark (LWM) - the smallest partition counter > > >> before which there are no prepared transactions. > > >> > > >> And the we send to supplier node a message: "Give me all updates starting > > >> from that LWM plus data for that transactions which were active when I > > >> failed". > > >> > > >> Am I right? > > >> > > >> On Fri, Nov 23, 2018 at 11:22 AM Seliverstov Igor <gvvinbl...@gmail.com> > > >> wrote: > > >> > > >> > Hi Igniters, > > >> > > > >> > Currently I’m working on possible approaches how to implement > > >> > historical > > >> > rebalance (delta rebalance using WAL iterator) over MVCC caches. > > >> > > > >> > The main difficulty is that MVCC writes changes on tx active phase > > >> > while > > >> > partition update version, aka update counter, is being applied on tx > > >> > finish. This means we cannot start iteration over WAL right from the > > >> > pointer where the update counter updated, but should include updates, > > >> which > > >> > the transaction that updated the counter did. > > >> > > > >> > These updates may be much earlier than the point where the update > > >> counter > > >> > was updated, so we have to be able to identify the point where the > > >> > first > > >> > update happened. > > >> > > > >> > The proposed approach includes: > > >> > > > >> > 1) preserve list of active txs, sorted by the time of their first > > >> > update > > >> > (using WAL ptr of first WAL record in tx) > > >> > > > >> > 2) persist this list on each checkpoint (together with TxLog for > > >> example) > > >> > > > >> > 4) send whole active tx list (transactions which were in active state > > >> > at > > >> > the time the node was crushed, empty list in case of graceful node > > >> stop) as > > >> > a part of partition demand message. > > >> > > > >> > 4) find a checkpoint where the earliest tx exists in persisted txs and > > >> use > > >> > saved WAL ptr as a start point or apply current approach in case the > > >> active > > >> > tx list (sent on previous step) is empty > > >> > > > >> > 5) start iteration. > > >> > > > >> > Your thoughts? > > >> > > > >> > Regards, > > >> > Igor > > >> > > > > > > > -- > Best regards, > Ivan Pavlukhin
-- Best regards, Ivan Pavlukhin