Hi Alex, Thanks a lot for your feedback! #1 Could you please share any case when we need to use different modes for different caches? It's a quite global thing to support two modes on the same time.
#2 Certain we can use another name. "Extinct Version Ejection" for example. BR, Serge On Tue, 15 Aug 2017 at 18:22, Alexey Kuznetsov <akuznet...@apache.org> wrote: > Serge, > > Cool feature! > > I have following questions: > 1) MVCC will be a kind of "cache mode"? And we can have caches with old > behavior and caches with MVCC? > 2) " Garbage collection" - may be we should give another name to not > intersect with JVM GC? > > Thanks! > > On Tue, Aug 15, 2017 at 8:11 PM, Serge Puchnin n <sergey.puch...@gmail.com > > wrote: > >> Hello Ignite Developers, >> >> I’d like to start a discussion about the design of MVCC implementation >> [1]. >> It’ll help us to solve an issue when a user might see a partially >> committed transaction. >> >> Below you can find the proposed design. >> Please provide your feedback/thoughts about suggested solution. >> >> Thanks a lot, >> Sergey Puchnin >> >> [1] https://issues.apache.org/jira/browse/IGNITE-3478 >> >> >> *Multi-Version Concurrency Control Architecture* >> >> >> Abstract >> >> This page contains high-level description of MVCC Architecture for JIRA >> https://issues.apache.org/jira/browse/IGNITE-3478 >> Problem Description >> >> Current Ignite SQL does not take into account transaction boundaries. For >> example, if a transaction atomically changes the balance of two accounts, >> then a concurrent SQL query can see a partially committed transaction. This >> works for data analytics but does not work for more strict and demanding >> scenarios. >> It would be perfect if we had a mode which ensures transaction boundaries >> are taken into account for SQL query. >> Design Description >> >> Multi-Version Concurrency Control (MVCC) is a concurrency control >> mechanism that has been implemented in numerous RDBMS systems. >> The main approach of this solution is instead of change data in-place a >> session creates a new version of data, and different transactions are able >> to get a correct version in concurrent surroundings. >> Consequently, readers never block writers, writers never block readers, >> readers >> don't need any locks. >> >> In cluster environment data are well distributed via different >> partitions, nodes and it's necessary to support cases when read-write >> transactions are processed by nodes in an accidental order. >> >> At this point, we have to revise the only transactions that need to >> update or read data on more than one partition. >> >> To provide cross-cache ACID transactions for all isolation levels and >> lock models in distributed environment it necessary to determinate what >> changes were made before a transaction was started and are visible for the >> transaction. >> >> To solve that new component is provided in the system - TS coordinator >> and two new attributes are added to an entity: minTS and maxTS. >> The coordinator will order all changes for multi-partition transactions. >> minTS and maxTS will determine a visibility of specific version for a >> transaction. >> >> A base workflow is include next steps. >> 1. Initialize >> >> During initialization, a transaction asks a node that acts as a TS >> coordinator to get a new transaction identifier (currentTX). The currentTX >> is a monotonic, unique identifier to order all changes. >> The coordinator adds current transaction into an active transaction list >> in RUNNING state. >> Also, the coordinator informs the transaction about all transactions were >> registered before its and now are in RUNNING state (excluding TX). >> 2. Writing >> >> On Prepare stage the transaction get all necessary locks. >> On Commit stage the transaction: >> 1. To find current version in PK Index (minTS = currentTX or maxTS = 0) >> 2. To find current version in Data Page by a link from #1 >> 3. Insert new version into Data Page and set minTS = currentTX, maxTS=0 >> 4. Insert new version into PK Index set minTS = currentTX set maxTS=0 and >> set link to #2 >> 5. Update current version in Data Page (find via link from #1) set maxTS >> = currentTX >> 6. Update current version in PK Index Page (find at #1) set maxTS = >> currentTX >> 7. Find current version in Secondary Index (minTS = currentTX or maxTS = >> 0) >> 8. Insert new version into Secondary Index set minTS = currentTX set >> maxTS=0 set link to #4 >> 9. Update current version in Secondary Index set maxTS=currentTX >> >> Due to Two-Phase Commit it's not possible to get a case when two not >> committed transactions update the same key. >> For delete statements steps 3, 4, 8 should be skipped. >> 3. Reading >> >> For a transaction with Repeatable Read, a value will be cached on Near >> Node. >> For other levels, a transaction should find a version for which the >> following condition holds: >> minTS >= CurrentTS and (maxTS < CurrentTS or maxTS = 0). Also minTS, >> maxTS should not be in excluding TX list. >> For Read Committed, CurrentTS is an identifier on start statement moment. >> For Serialization level, CurrentTS is an identifier on start session >> moment. In addition to that if a session has found that maxTS !=0 an error >> "It's not possible to serialize the result" should be raised. >> 4. Garbage collection >> >> During a Garbage collection procedure any entity's versions with maxTS >> less than minimal currentTX for sessions with RUNNING state can be deleted. >> There are no cases when we should show these values. minTS for next version >> should be updated to 0. >> 5. Sample >> >> As a sample, there are three write operation and check changes in data >> structures. >> >> >> >> >> >> > > > -- > Alexey Kuznetsov >