[
https://issues.apache.org/jira/browse/IGNITE-11433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vladimir Ozerov updated IGNITE-11433:
-------------------------------------
Ignite Flags: (was: Docs Required)
> MVCC: Link entry versions at the Data Store layer.
> --------------------------------------------------
>
> Key: IGNITE-11433
> URL: https://issues.apache.org/jira/browse/IGNITE-11433
> Project: Ignite
> Issue Type: Improvement
> Components: mvcc, sql
> Reporter: Igor Seliverstov
> Priority: Major
>
> At now all tuple versions are placed inside index trees. CacheDataTree is
> used to link versions each to other (using their order inside an index page).
> Despite the fact that this approach is easy to implement and preferable at
> the first point, it brings several disadvantages:
> 1) We need to iterate over tuple versions at update time under a read (or
> even write) lock on an index page which blocks other write (read) operations
> for a relatively long period of time.
> 2) Write amplification suffers not only Data Store layer, but indexes as
> well, which makes read/lookup ops into indexes much slower.
> 3) We cannot implement several important improvements (data streamer
> optimizations) because having several versions of one key in an index page
> doesn't allow using of Invoke operations.
> Using versions linking at the Data Store only (like it do other vendors)
> solves or decreases impact of that issues.
> So, the proposed changes:
> 1) Change data page layout adding two fields into its header: {{link}} (a
> link to the next tuple in a versions chain) and {{lock}} (a tx, which holds a
> write lock on the HEAD of the chain) There are several possible
> optimizations: 1) leave lock as is (in the cache index item) 2) use max
> version as lock version as well
> 2) Do not save all versions of a tuple in indexes; this mean removing version
> from key - newest version will overwrite an existing entry
> There are two approaches with some pros and cons of how to link versions:
> 1) N2O (newer to older) - a reader (writer) gets the newest tuple version
> first and iterates over tuple versions from newer to older until it gets a
> position where it's snapshot placed between min and max versions of the
> examined tuple. Approach implies faster reads (more actual versions are get
> first) and necessity of updating all involved indexes on each write operation
> - slower writes in other words (may be optimized using logical pointers to
> the head of tuple versions chain). Cooperative VAC (update operations remove
> invisible for all readers tuple versions) is possible.
> 2) O2N (older to newer) - a reader gets the oldest visible tuple version and
> iterates over versions until it gets visible version. It allows not to update
> all indexes (except the case when an index value is changed), write
> operations become lighter. Cooperative VAC almost impossible.
> We need to decide which approach to use depending on that load profile is
> preferable (OLTP/OLAP)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)