Re: Multi-Version Concurrency Control

Serge Puchnin Wed, 16 Aug 2017 06:40:02 -0700

Hi Alex,
Thanks a lot for your feedback!

#1 Could you please share any case when we need to use different modes for
different caches?
It's a quite global thing to support two modes on the same time.


#2 Certain  we can use another name. "Extinct Version Ejection" for
example.

BR,
Serge

On Tue, 15 Aug 2017 at 18:22, Alexey Kuznetsov <akuznet...@apache.org>
wrote:

> Serge,
>
> Cool feature!
>
> I have following questions:
> 1) MVCC will be a kind of "cache mode"? And we can have caches with old
> behavior and caches with MVCC?
> 2) " Garbage collection" - may be we should give another name to not
> intersect with JVM GC?
>
> Thanks!
>
> On Tue, Aug 15, 2017 at 8:11 PM, Serge Puchnin n <sergey.puch...@gmail.com
> > wrote:
>
>> Hello Ignite Developers,
>>
>> I’d like to start a discussion about the design of  MVCC implementation
>> [1].
>> It’ll help us to solve an issue when a user might see a partially
>> committed transaction.
>>
>> Below you can find the proposed design.
>> Please provide your feedback/thoughts about suggested solution.
>>
>> Thanks a lot,
>> Sergey Puchnin
>>
>> [1] https://issues.apache.org/jira/browse/IGNITE-3478
>>
>>
>> *Multi-Version Concurrency Control Architecture*
>>
>>
>> Abstract
>>
>> This page contains high-level description of MVCC Architecture for JIRA
>> https://issues.apache.org/jira/browse/IGNITE-3478
>> Problem Description
>>
>> Current Ignite SQL does not take into account transaction boundaries. For
>> example, if a transaction atomically changes the balance of two accounts,
>> then a concurrent SQL query can see a partially committed transaction. This
>> works for data analytics but does not work for more strict and demanding
>> scenarios.
>> It would be perfect if we had a mode which ensures transaction boundaries
>> are taken into account for SQL query.
>> Design Description
>>
>> Multi-Version Concurrency Control (MVCC) is a concurrency control
>> mechanism that has been implemented in numerous RDBMS systems.
>> The main approach of this solution is instead of change data in-place a
>> session creates a new version of data, and different transactions are able
>> to get a correct version in concurrent surroundings.
>> Consequently, readers never block writers, writers never block readers, 
>> readers
>> don't need any locks.
>>
>> In cluster environment data are well distributed via different
>> partitions, nodes and it's necessary to support cases when read-write
>> transactions are processed by nodes in an accidental order.
>>
>> At this point, we have to revise the only transactions that need to
>> update or read data on more than one partition.
>>
>> To provide cross-cache ACID transactions for all isolation levels and
>> lock models in distributed environment it necessary to determinate what
>> changes were made before a transaction was started and are visible for the
>> transaction.
>>
>> To solve that new component is provided in the system - TS coordinator
>> and two new attributes are added to an entity: minTS and maxTS.
>> The coordinator will order all changes for multi-partition transactions.
>> minTS and maxTS will determine a visibility of specific version for a
>> transaction.
>>
>> A base workflow is include next steps.
>> 1. Initialize
>>
>> During initialization, a transaction asks a node that acts as a TS
>> coordinator to get a new transaction identifier (currentTX). The currentTX
>> is a monotonic, unique identifier to order all changes.
>> The coordinator adds current transaction into an active transaction list
>> in RUNNING state.
>> Also, the coordinator informs the transaction about all transactions were
>> registered before its and now are in RUNNING state (excluding TX).
>> 2. Writing
>>
>> On Prepare stage the transaction get all necessary locks.
>> On Commit stage the transaction:
>> 1. To find current version in PK Index (minTS = currentTX or maxTS = 0)
>> 2. To find current version in Data Page by a link from #1
>> 3. Insert new version into Data Page and set minTS = currentTX, maxTS=0
>> 4. Insert new version into PK Index set minTS = currentTX set maxTS=0 and
>> set link to #2
>> 5. Update current version in Data Page (find via link from #1) set maxTS
>> = currentTX
>> 6. Update current version in PK Index Page (find at #1) set maxTS =
>> currentTX
>> 7. Find current version in Secondary Index (minTS = currentTX or maxTS =
>> 0)
>> 8. Insert new version into Secondary Index set minTS = currentTX set
>> maxTS=0 set link to #4
>> 9. Update current version in Secondary Index set maxTS=currentTX
>>
>> Due to Two-Phase Commit it's not possible to get a case when two not
>> committed transactions update the same key.
>> For delete statements steps 3, 4, 8 should be skipped.
>> 3. Reading
>>
>> For a transaction with Repeatable Read, a value will be cached on Near
>> Node.
>> For other levels, a transaction should find a version for which the
>> following condition holds:
>> minTS >= CurrentTS and (maxTS < CurrentTS or maxTS = 0). Also minTS,
>> maxTS should not be in excluding TX list.
>> For Read Committed, CurrentTS is an identifier on start statement moment.
>> For Serialization level, CurrentTS is an identifier on start session
>> moment. In addition to that if a session has found that maxTS !=0 an error
>> "It's not possible to serialize the result" should be raised.
>> 4. Garbage collection
>>
>> During a Garbage collection procedure any entity's versions with maxTS
>> less than minimal currentTX for sessions with RUNNING state can be deleted.
>> There are no cases when we should show these values. minTS for next version
>> should be updated to 0.
>> 5. Sample
>>
>> As a sample, there are three write operation and check changes in data
>> structures.
>>
>>
>> 
>>
>>
>>
>
>
> --
> Alexey Kuznetsov
>

Re: Multi-Version Concurrency Control

Reply via email to