Vladimir, No it’s crystal clear, thanks.
If this approach works only for Ignite persistence based deployment, how will we handle locking for pure in-memory and caching of 3rd party databases scenarios? As I understand the tuples still will be stored in the page memory while there won’t be any opportunity to fallback to disk if the memory usage increases some threshold. — Denis > On Dec 13, 2017, at 11:21 PM, Vladimir Ozerov <voze...@gridgain.com> wrote: > > Denis, > > Sorry, may be I was not clear enough - "tuple-approach" and "persistent > approach" are the same. By "tuple" I mean a row stored inside a data block. > Currently we store lock information in Java heap and proposal is to move it > to data blocks. The main driver is memory - if there are a rows to be > locked we will either run out of memory, or produce serious memory > pressure. For example, currently update of 1M entries will consume ~500Mb > of heap. With proposed approach it will consume almost nothing. The > drawback is increased number of dirty data pages, but it should not be a > problem because in final implementation we will update data rows before > prepare phase anyway, so I do not expect any write amplification in usual > case. > > This approach is only applicable for Ignite persistence. > > On Thu, Dec 14, 2017 at 1:53 AM, Denis Magda <dma...@apache.org> wrote: > >> Vladimir, >> >> Thanks for a throughout overview and proposal. >> >>> Also we could try employing tiered approach >>> 1) Try to keep everything in-memory to minimize writes to blocks >>> 2) Fallback to persistent lock data if certain threshold is reached. >> >> What are the benefits of the backed-by-persistence approach in compare to >> the one based on tuples? Specifically: >> - will the persistence approach work for both 3rd party and Ignite >> persistence? >> - any performance impacts depending on a chosen method? >> - what’s faster to implement? >> >> — >> Denis >> >>> On Dec 13, 2017, at 2:10 AM, Vladimir Ozerov <voze...@gridgain.com> >> wrote: >>> >>> Igniters, >>> >>> As you probably we know we work actively on MVCC [1] and transactional >> SQL >>> [2] features which could be treated as a single huge improvement. We >> face a >>> number of challenges and one of them is locking. >>> >>> At the moment information about all locks is kept in memory on per-entry >>> basis (see GridCacheMvccManager). For every locked key we maintain >> current >>> lock owner (XID) and the list of would-be-owner transactions. When >>> transaction is about to lock an entry two scenarios are possible: >>> 1) If entry is not locked we obtain the lock immediately >>> 2) if entry is locked we add current transaction to the wait list and >> jumps >>> to the next entry to be locked. Once the first entry is released by >>> conflicting transaction, current transaction becomes an owner of the >> first >>> entry and tries to promote itself for subsequent entries. >>> >>> Once all required locks are obtained, response is sent to the caller. >>> >>> This approach doesn't work well for transactional SQL - if we update >>> millions of rows in a single transaction we will simply run out of >> memory. >>> To mitigate the problem other database vendors keep information about >> locks >>> inside the tuples. I propose to apply the similar design as follows: >>> >>> 1) No per-entry lock information is stored in memory anymore. >>> 2) The list of active transactions are maintained in memory still >>> 3) When TX locks an entry, it sets special marker to the tuple [3] >>> 4) When TX meets already locked entry, it enlists itself to wait queue of >>> conflicting transaction and suspends >>> 5) When first transaction releases conflicting lock, it notifies and >> wakes >>> up suspended transactions, so they resume locking >>> 6) Entry lock data is cleared on transaction commit >>> 7) Entry lock data is not cleared on rollback or node restart; Instead, >> we >>> will could use active transactions list to identify invalid locks and >>> overwrite them as needed. >>> >>> Also we could try employing tiered approach >>> 1) Try to keep everything in-memory to minimize writes to blocks >>> 2) Fallback to persistent lock data if certain threshold is reached. >>> >>> Thoughts? >>> >>> [1] https://issues.apache.org/jira/browse/IGNITE-3478 >>> [2] https://issues.apache.org/jira/browse/IGNITE-4191 >>> [3] Depends on final MVCC design - it could be per-tuple XID, undo >> vectors, >>> per-block transaction lists, etc.. >>> >>> Vladimir. >> >>