Anton, You referenced to failover scenarios. I believe that everything is described in IEP. But to make this discussion self-sufficient could you please outline what prevents primary partition relocation when Read Repair is in progress? Is there a transaction or an exlicit lock?
пн, 15 июл. 2019 г. в 23:49, Ivan Rakov <ivan.glu...@gmail.com>: > > Anton, > > > Step-by-step: > > 1) primary locked on key mention (get/put) at pessimistic/!read-committed tx > > 2) backups locked on prepare > > 3) primary unlocked on finish > > 4) backups unlocked on finish (after the primary) > > correct? > Yes, this corresponds to my understanding of transactions protocol. With > minor exception: steps 3 and 4 are inverted in case of one-phase commit. > > > Agree, but seems there is no need to acquire the lock, we have just to wait > > until entry becomes unlocked. > > - entry locked means that previous tx's "finish" phase is in progress > > - entry unlocked means reading value is up-to-date (previous "finish" phase > > finished) > > correct? > Diving deeper, entry is locked if its GridCacheMapEntry.localCandidates > queue is not empty (first item in queue is actually the transaction that > owns lock). > > > we have just to wait > > until entry becomes unlocked. > This may work. > If consistency checking code has acquired lock on primary, backup can be > in two states: > - not locked - and new locks won't appear as we are holding lock on primary > - still locked by transaction that owned lock on primary just before our > checking code - in such case checking code should just wait for lock release > > Best Regards, > Ivan Rakov > > On 15.07.2019 9:34, Anton Vinogradov wrote: > > Ivan R. > > > > Thanks for joining! > > > > Got an idea, but not sure that got a way of a fix. > > > > AFAIK (can be wrong, please correct if necessary), at 2PC, locks are > > acquired on backups during the "prepare" phase and released at "finish" > > phase after primary fully committed. > > Step-by-step: > > 1) primary locked on key mention (get/put) at pessimistic/!read-committed tx > > 2) backups locked on prepare > > 3) primary unlocked on finish > > 4) backups unlocked on finish (after the primary) > > correct? > > > > So, acquiring locks on backups, not at the "prepare" phase, may cause > > unexpected behavior in case of primary fail or other errors. > > That's definitely possible to update failover to solve this issue, but it > > seems to be an overcomplicated way. > > The main question there, it there any simple way? > > > >>> checking read from backup will just wait for commit if it's in progress. > > Agree, but seems there is no need to acquire the lock, we have just to wait > > until entry becomes unlocked. > > - entry locked means that previous tx's "finish" phase is in progress > > - entry unlocked means reading value is up-to-date (previous "finish" phase > > finished) > > correct? > > > > On Mon, Jul 15, 2019 at 8:37 AM Павлухин Иван <vololo...@gmail.com> wrote: > > > >> Anton, > >> > >> I did not know mechanics locking entries on backups during prepare > >> phase. Thank you for pointing that out! > >> > >> пт, 12 июл. 2019 г. в 22:45, Ivan Rakov <ivan.glu...@gmail.com>: > >>> Hi Anton, > >>> > >>>> Each get method now checks the consistency. > >>>> Check means: > >>>> 1) tx lock acquired on primary > >>>> 2) gained data from each owner (primary and backups) > >>>> 3) data compared > >>> Did you consider acquiring locks on backups as well during your check, > >>> just like 2PC prepare does? > >>> If there's HB between steps 1 (lock primary) and 2 (update primary + > >>> lock backup + update backup), you may be sure that there will be no > >>> false-positive results and no deadlocks as well. Protocol won't be > >>> complicated: checking read from backup will just wait for commit if it's > >>> in progress. > >>> > >>> Best Regards, > >>> Ivan Rakov > >>> > >>> On 12.07.2019 9:47, Anton Vinogradov wrote: > >>>> Igniters, > >>>> > >>>> Let me explain problem in detail. > >>>> Read Repair at pessimistic tx (locks acquired on primary, full sync, > >> 2pc) > >>>> able to see consistency violation because backups are not updated yet. > >>>> This seems to be not a good idea to "fix" code to unlock primary only > >> when > >>>> backups updated, this definitely will cause a performance drop. > >>>> Currently, there is no explicit sync feature allows waiting for backups > >>>> updated during the previous tx. > >>>> Previous tx just sends GridNearTxFinishResponse to the originating > >> node. > >>>> Bad ideas how to handle this: > >>>> - retry some times (still possible to gain false positive) > >>>> - lock tx entry on backups (will definitely break failover logic) > >>>> - wait for same entry version on backups during some timeout (will > >> require > >>>> huge changes at "get" logic and false positive still possible) > >>>> > >>>> Is there any simple fix for this issue? > >>>> Thanks for tips in advance. > >>>> > >>>> Ivan, > >>>> thanks for your interest > >>>> > >>>>>> 4. Very fast and lucky txB writes a value 2 for the key on primary > >> and > >>>> backup. > >>>> AFAIK, reordering not possible since backups "prepared" before primary > >>>> releases lock. > >>>> So, consistency guaranteed by failover and by "prepare" feature of 2PC. > >>>> Seems, the problem is NOT with consistency at AI, but with consistency > >>>> detection implementation (RR) and possible "false positive" results. > >>>> BTW, checked 1PC case (only one data node at test) and gained no > >> issues. > >>>> On Fri, Jul 12, 2019 at 9:26 AM Павлухин Иван <vololo...@gmail.com> > >> wrote: > >>>>> Anton, > >>>>> > >>>>> Is such behavior observed for 2PC or for 1PC optimization? Does not it > >>>>> mean that the things can be even worse and an inconsistent write is > >>>>> possible on a backup? E.g. in scenario: > >>>>> 1. txA writes a value 1 for the key on primary. > >>>>> 2. txA unlocks the key on primary. > >>>>> 3. txA freezes before updating backup. > >>>>> 4. Very fast and lucky txB writes a value 2 for the key on primary and > >>>>> backup. > >>>>> 5. txB wakes up and writes 1 for the key. > >>>>> 6. As result there is 2 on primary and 1 on backup. > >>>>> > >>>>> Naively it seems that locks should be released after all replicas are > >>>>> updated. > >>>>> > >>>>> ср, 10 июл. 2019 г. в 16:36, Anton Vinogradov <a...@apache.org>: > >>>>>> Folks, > >>>>>> > >>>>>> Investigating now unexpected repairs [1] in case of ReadRepair usage > >> at > >>>>>> testAccountTxNodeRestart. > >>>>>> Updated [2] the test to check is there any repairs happen. > >>>>>> Test's name now is "testAccountTxNodeRestartWithReadRepair". > >>>>>> > >>>>>> Each get method now checks the consistency. > >>>>>> Check means: > >>>>>> 1) tx lock acquired on primary > >>>>>> 2) gained data from each owner (primary and backups) > >>>>>> 3) data compared > >>>>>> > >>>>>> Sometime, backup may have obsolete value during such check. > >>>>>> > >>>>>> Seems, this happen because tx commit on primary going in the > >> following > >>>>> way > >>>>>> (check code [2] for details): > >>>>>> 1) performing localFinish (releases tx lock) > >>>>>> 2) performing dhtFinish (commits on backups) > >>>>>> 3) transferring control back to the caller > >>>>>> > >>>>>> So, seems, the problem here is that "tx lock released on primary" > >> does > >>>>> not > >>>>>> mean that backups updated, but "commit() method finished at caller's > >>>>>> thread" does. > >>>>>> This means that, currently, there is no happens-before between > >>>>>> 1) thread 1 committed data on primary and tx lock can be reobtained > >>>>>> 2) thread 2 reads from backup > >>>>>> but still strong HB between "commit() finished" and "backup updated" > >>>>>> > >>>>>> So, it seems to be possible, for example, to gain notification by a > >>>>>> continuous query, then read from backup and gain obsolete value. > >>>>>> > >>>>>> Is this "partial happens before" behavior expected? > >>>>>> > >>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-11973 > >>>>>> [2] https://github.com/apache/ignite/pull/6679/files > >>>>>> [3] > >>>>>> > >> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx > >>>>> > >>>>> > >>>>> -- > >>>>> Best regards, > >>>>> Ivan Pavlukhin > >>>>> > >> > >> > >> -- > >> Best regards, > >> Ivan Pavlukhin > >> -- Best regards, Ivan Pavlukhin