[jira] [Created] (IGNITE-20072) Reduce waiting time in partition listener

Ivan Bessonov (Jira) Thu, 27 Jul 2023 04:18:07 -0700

Ivan Bessonov created IGNITE-20072:
--------------------------------------

             Summary: Reduce waiting time in partition listener
                 Key: IGNITE-20072
                 URL: https://issues.apache.org/jira/browse/IGNITE-20072
             Project: Ignite
          Issue Type: Improvement
            Reporter: Ivan Bessonov



h3. The problem

for every row that we update, we must take a lock for the rowId. This 
guarantees consistency of data and indexes. But, there are cons:
 * we must allocate locks, as well as a collection for locks;
 * we may wait for some time while GC or another background process does its 
thing.

Ideally, no locks should be used. We must investigate the issue and reduce 
waiting time or eliminate it completely. Following is the short explanation of 
what's going now right now, and some ideas about possible fixes.
h3. Full state transfer (snapshot) lock

{{OutgoingSnapshot#acquireMvLock}}

This locks is required to provide cooperation of partition listener and the 
snapshot output process. The idea is the following:
 * "snapshot reader" iterates the partition
 * if there's a concurrent load, we "notify" the reader, that certain version 
chains are updated, and their previous state must be preserved

Both of these steps take exclusive lock to process data. This is not optimal.

There is (almost) a way around it: persisting update timestamp together with 
write intents. This way snapshot will effectively be equivalent to the "scan by 
timestamp", but including write intents.

One part that makes this process not optimal is updating/committing current 
write intents. These two operations replace the head of the version chain, 
losing its previous value, making "pure" timestamp scan impossible.

Another point is the time that we hold the lock in snapshot reader. It must be 
a single lock per single entry at most. Ideally, partition listener should have 
higher priority while acquiring that lock. This is easy to do with row-level 
locks that we use for GC.
There is also a way to save on reading the entire version chain holding a 
single lock, doing that in several sessions, and we should do this. The Justin 
Bieber Problem is not a joke. The time that we hold any lock must be 
predictable and limited by some known bounds.

One thing to consider: we should probably perform write intent resolution in 
snapshot reader. Otherwise the receiver node will have to do it manually later, 
but with no explicit notification that some of transactions are already 
committed or rolled back.
h3. GC locks

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-20072) Reduce waiting time in partition listener

Reply via email to