Serge,

Cool feature!

I have following questions:
1) MVCC will be a kind of "cache mode"? And we can have caches with old
behavior and caches with MVCC?
2) " Garbage collection" - may be we should give another name to not
intersect with JVM GC?

Thanks!

On Tue, Aug 15, 2017 at 8:11 PM, Serge Puchnin n <sergey.puch...@gmail.com>
wrote:

> Hello Ignite Developers,
>
> I’d like to start a discussion about the design of  MVCC implementation
> [1].
> It’ll help us to solve an issue when a user might see a partially
> committed transaction.
>
> Below you can find the proposed design.
> Please provide your feedback/thoughts about suggested solution.
>
> Thanks a lot,
> Sergey Puchnin
>
> [1] https://issues.apache.org/jira/browse/IGNITE-3478
>
>
> *Multi-Version Concurrency Control Architecture*
>
>
> Abstract
>
> This page contains high-level description of MVCC Architecture for JIRA
> https://issues.apache.org/jira/browse/IGNITE-3478
> Problem Description
>
> Current Ignite SQL does not take into account transaction boundaries. For
> example, if a transaction atomically changes the balance of two accounts,
> then a concurrent SQL query can see a partially committed transaction. This
> works for data analytics but does not work for more strict and demanding
> scenarios.
> It would be perfect if we had a mode which ensures transaction boundaries
> are taken into account for SQL query.
> Design Description
>
> Multi-Version Concurrency Control (MVCC) is a concurrency control
> mechanism that has been implemented in numerous RDBMS systems.
> The main approach of this solution is instead of change data in-place a
> session creates a new version of data, and different transactions are able
> to get a correct version in concurrent surroundings.
> Consequently, readers never block writers, writers never block readers, 
> readers
> don't need any locks.
>
> In cluster environment data are well distributed via different partitions,
> nodes and it's necessary to support cases when read-write transactions are
> processed by nodes in an accidental order.
>
> At this point, we have to revise the only transactions that need to update
> or read data on more than one partition.
>
> To provide cross-cache ACID transactions for all isolation levels and lock
> models in distributed environment it necessary to determinate what changes
> were made before a transaction was started and are visible for the
> transaction.
>
> To solve that new component is provided in the system - TS coordinator and
> two new attributes are added to an entity: minTS and maxTS.
> The coordinator will order all changes for multi-partition transactions.
> minTS and maxTS will determine a visibility of specific version for a
> transaction.
>
> A base workflow is include next steps.
> 1. Initialize
>
> During initialization, a transaction asks a node that acts as a TS
> coordinator to get a new transaction identifier (currentTX). The currentTX
> is a monotonic, unique identifier to order all changes.
> The coordinator adds current transaction into an active transaction list
> in RUNNING state.
> Also, the coordinator informs the transaction about all transactions were
> registered before its and now are in RUNNING state (excluding TX).
> 2. Writing
>
> On Prepare stage the transaction get all necessary locks.
> On Commit stage the transaction:
> 1. To find current version in PK Index (minTS = currentTX or maxTS = 0)
> 2. To find current version in Data Page by a link from #1
> 3. Insert new version into Data Page and set minTS = currentTX, maxTS=0
> 4. Insert new version into PK Index set minTS = currentTX set maxTS=0 and
> set link to #2
> 5. Update current version in Data Page (find via link from #1) set maxTS =
> currentTX
> 6. Update current version in PK Index Page (find at #1) set maxTS =
> currentTX
> 7. Find current version in Secondary Index (minTS = currentTX or maxTS = 0)
> 8. Insert new version into Secondary Index set minTS = currentTX set
> maxTS=0 set link to #4
> 9. Update current version in Secondary Index set maxTS=currentTX
>
> Due to Two-Phase Commit it's not possible to get a case when two not
> committed transactions update the same key.
> For delete statements steps 3, 4, 8 should be skipped.
> 3. Reading
>
> For a transaction with Repeatable Read, a value will be cached on Near
> Node.
> For other levels, a transaction should find a version for which the
> following condition holds:
> minTS >= CurrentTS and (maxTS < CurrentTS or maxTS = 0). Also minTS, maxTS
> should not be in excluding TX list.
> For Read Committed, CurrentTS is an identifier on start statement moment.
> For Serialization level, CurrentTS is an identifier on start session
> moment. In addition to that if a session has found that maxTS !=0 an error
> "It's not possible to serialize the result" should be raised.
> 4. Garbage collection
>
> During a Garbage collection procedure any entity's versions with maxTS
> less than minimal currentTX for sessions with RUNNING state can be deleted.
> There are no cases when we should show these values. minTS for next version
> should be updated to 0.
> 5. Sample
>
> As a sample, there are three write operation and check changes in data
> structures.
>
>
> ​
>
>
>


-- 
Alexey Kuznetsov

Reply via email to