[ 
https://issues.apache.org/jira/browse/IGNITE-11433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-11433:
-------------------------------------
    Issue Type: Task  (was: Improvement)

> MVCC: Link entry versions at the Data Store layer.
> --------------------------------------------------
>
>                 Key: IGNITE-11433
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11433
>             Project: Ignite
>          Issue Type: Task
>          Components: mvcc, sql
>            Reporter: Igor Seliverstov
>            Priority: Major
>
> At now all tuple versions are placed inside index trees. CacheDataTree is 
> used to link versions each to other (using their order inside an index page).
> Despite the fact that this approach is easy to implement and preferable at 
> the first point, it brings several disadvantages:
> 1) We need to iterate over tuple versions at update time under a read (or 
> even write) lock on an index page which blocks other write (read) operations 
> for a relatively long period of time.
> 2) Write amplification suffers not only Data Store layer, but indexes as 
> well, which makes read/lookup ops into indexes much slower.
> 3) We cannot implement several important improvements (data streamer 
> optimizations) because having several versions of one key in an index page 
> doesn't allow using of Invoke operations.
> Using versions linking at the Data Store only (like it do other vendors) 
> solves or decreases impact of that issues.
> So, the proposed changes:
> 1) Change data page layout adding two fields into its header: {{link}} (a 
> link to the next tuple in a versions chain) and {{lock}} (a tx, which holds a 
> write lock on the HEAD of the chain) There are several possible 
> optimizations: 1) leave lock as is (in the cache index item) 2) use max 
> version as lock version as well
> 2) Do not save all versions of a tuple in indexes; this mean removing version 
> from key - newest version will overwrite an existing entry
> There are two approaches with some pros and cons of how to link versions:
> 1) N2O (newer to older) - a reader (writer) gets the newest tuple version 
> first and iterates over tuple versions from newer to older until it gets a 
> position where it's snapshot placed between min and max versions of the 
> examined tuple. Approach implies faster reads (more actual versions are get 
> first) and necessity of updating all involved indexes on each write operation 
> - slower writes in other words (may be optimized using logical pointers to 
> the head of tuple versions chain). Cooperative VAC (update operations remove 
> invisible for all readers tuple versions) is possible.
> 2) O2N (older to newer) - a reader gets the oldest visible tuple version and 
> iterates over versions until it gets visible version. It allows not to update 
> all indexes (except the case when an index value is changed), write 
> operations become lighter. Cooperative VAC almost impossible.
> We need to decide which approach to use depending on that load profile is 
> preferable (OLTP/OLAP)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to