[ 
https://issues.apache.org/jira/browse/IGNITE-11433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Seliverstov updated IGNITE-11433:
--------------------------------------
    Description: 
At now all tuple versions are placed inside index trees. CacheDataTree is used 
to link versions each to other (using their order inside an index page).

Despite the fact that this approach is easy to implement and preferable at the 
first point, it brings several disadvantages:

1) We need to iterate over tuple versions at update time under a read (or even 
write) lock on an index page which blocks other write (read) operations for a 
relatively long period of time.
2) We index all tuple versions which makes indexes use much more space than 
needed
3) We cannot implement several important improvements (data streamer 
optimizations) because having several versions of one key in an index page 
doesn't allow using of Invoke operations.
3) Write amplification suffers not only Data Store layer, but indexes as well, 
which makes read/lookup ops into indexes much slower.

Using versions linking at the Data Store only (like it do other vendors) solves 
or decreases impact of that issues.

So, the proposed changes:

1) Change data page layout adding two fields into its header: {{link}} (a link 
to the next tuple in a versions chain) and {{lock}} (a tx, which holds a write 
lock on the HEAD of the chain) There are several possible optimizations: 1) 
leave lock as is (in the cache index item) 2) use max version as lock version 
as well
2) Do not save all versions of a tuple in indexes; this mean removing version 
from key - newest version will overwrite an existing entry

There are two approaches with some pros and cons of how to link versions:

1) N2O (newer to older) - a reader (writer) gets the newest tuple version first 
and iterates over tuple versions from newer to older until it gets a position 
where it's snapshot placed between min and max versions of the examined tuple. 
Approach implies faster reads (more actual versions are get first) and 
necessity of updating all involved indexes on each write operation - slower 
writes in other words (may be optimized using logical pointers to the head of 
tuple versions chain). Cooperative VAC (update operations remove invisible for 
all readers tuple versions) is possible.
2) O2N (older to newer) - a reader gets the oldest visible tuple version and 
iterates over versions until it gets visible version. It allows not to update 
all indexes (except the case when an index value is changed), write operations 
become lighter. Cooperative VAC almost impossible.

We need to decide which approach to use depending on that load profile is 
preferable (OLTP/OLAP)

  was:
At now all tuple versions are placed inside index trees. CacheDataTree is used 
to link versions each to other (using their order inside a data page).

Despite the fact that this approach is easy to implement and preferable at the 
first point, it brings several disadvantages:

1) We need to iterate over tuple versions at update time under a read (or even 
write) lock on an index page which blocks other write (read) operations for a 
relatively long period of time.
2) We index all tuple versions which makes indexes use much more space than 
needed
3) We cannot implement several important improvements (data streamer 
optimizations) because having several versions of one key in an index page 
doesn't allow using of Invoke operations.
3) Write amplification suffers not only Data Store layer, but indexes as well, 
which makes read/lookup ops into indexes much slower.

Using versions linking at the Data Store only (like it do other vendors) solves 
or decreases impact of that issues.

So, the proposed changes:

1) Change data page layout adding two fields into its header: {{link}} (a link 
to the next tuple in a versions chain) and {{lock}} (a tx, which holds a write 
lock on the HEAD of the chain) There are several possible optimizations: 1) 
leave lock as is (in the cache index item) 2) use max version as lock version 
as well
2) Do not save all versions of a tuple in indexes; this mean removing version 
from key - newest version will overwrite an existing entry

There are two approaches with some pros and cons of how to link versions:

1) N2O (newer to older) - a reader (writer) gets the newest tuple version first 
and iterates over tuple versions from newer to older until it gets a position 
where it's snapshot placed between min and max versions of the examined tuple. 
Approach implies faster reads (more actual versions are get first) and 
necessity of updating all involved indexes on each write operation - slower 
writes in other words (may be optimized using logical pointers to the head of 
tuple versions chain). Cooperative VAC (update operations remove invisible for 
all readers tuple versions) is possible.
2) O2N (older to newer) - a reader gets the oldest visible tuple version and 
iterates over versions until it gets visible version. It allows not to update 
all indexes (except the case when an index value is changed), write operations 
become lighter. Cooperative VAC almost impossible.

We need to decide which approach to use depending on that load profile is 
preferable (OLTP/OLAP)


> MVCC: Link entry versions at the Data Store layer.
> --------------------------------------------------
>
>                 Key: IGNITE-11433
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11433
>             Project: Ignite
>          Issue Type: Improvement
>          Components: mvcc, sql
>            Reporter: Igor Seliverstov
>            Priority: Major
>
> At now all tuple versions are placed inside index trees. CacheDataTree is 
> used to link versions each to other (using their order inside an index page).
> Despite the fact that this approach is easy to implement and preferable at 
> the first point, it brings several disadvantages:
> 1) We need to iterate over tuple versions at update time under a read (or 
> even write) lock on an index page which blocks other write (read) operations 
> for a relatively long period of time.
> 2) We index all tuple versions which makes indexes use much more space than 
> needed
> 3) We cannot implement several important improvements (data streamer 
> optimizations) because having several versions of one key in an index page 
> doesn't allow using of Invoke operations.
> 3) Write amplification suffers not only Data Store layer, but indexes as 
> well, which makes read/lookup ops into indexes much slower.
> Using versions linking at the Data Store only (like it do other vendors) 
> solves or decreases impact of that issues.
> So, the proposed changes:
> 1) Change data page layout adding two fields into its header: {{link}} (a 
> link to the next tuple in a versions chain) and {{lock}} (a tx, which holds a 
> write lock on the HEAD of the chain) There are several possible 
> optimizations: 1) leave lock as is (in the cache index item) 2) use max 
> version as lock version as well
> 2) Do not save all versions of a tuple in indexes; this mean removing version 
> from key - newest version will overwrite an existing entry
> There are two approaches with some pros and cons of how to link versions:
> 1) N2O (newer to older) - a reader (writer) gets the newest tuple version 
> first and iterates over tuple versions from newer to older until it gets a 
> position where it's snapshot placed between min and max versions of the 
> examined tuple. Approach implies faster reads (more actual versions are get 
> first) and necessity of updating all involved indexes on each write operation 
> - slower writes in other words (may be optimized using logical pointers to 
> the head of tuple versions chain). Cooperative VAC (update operations remove 
> invisible for all readers tuple versions) is possible.
> 2) O2N (older to newer) - a reader gets the oldest visible tuple version and 
> iterates over versions until it gets visible version. It allows not to update 
> all indexes (except the case when an index value is changed), write 
> operations become lighter. Cooperative VAC almost impossible.
> We need to decide which approach to use depending on that load profile is 
> preferable (OLTP/OLAP)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to