Vladimir,

No it’s crystal clear, thanks.

If this approach works only for Ignite persistence based deployment, how will 
we handle locking for pure in-memory and caching of 3rd party databases 
scenarios? As I understand the tuples still will be stored in the page memory 
while there won’t be any opportunity to fallback to disk if the memory usage 
increases some threshold.

—
Denis

> On Dec 13, 2017, at 11:21 PM, Vladimir Ozerov <voze...@gridgain.com> wrote:
> 
> Denis,
> 
> Sorry, may be I was not clear enough - "tuple-approach" and "persistent
> approach" are the same. By "tuple" I mean a row stored inside a data block.
> Currently we store lock information in Java heap and proposal is to move it
> to data blocks. The main driver is memory - if there are a rows to be
> locked we will either run out of memory, or produce serious memory
> pressure. For example, currently update of 1M entries will consume ~500Mb
> of heap. With proposed approach it will consume almost nothing. The
> drawback is increased number of dirty data pages, but it should not be a
> problem because in final implementation we will update data rows before
> prepare phase anyway, so I do not expect any write amplification in usual
> case.
> 
> This approach is only applicable for Ignite persistence.
> 
> On Thu, Dec 14, 2017 at 1:53 AM, Denis Magda <dma...@apache.org> wrote:
> 
>> Vladimir,
>> 
>> Thanks for a throughout overview and proposal.
>> 
>>> Also we could try employing tiered approach
>>> 1) Try to keep everything in-memory to minimize writes to blocks
>>> 2) Fallback to persistent lock data if certain threshold is reached.
>> 
>> What are the benefits of the backed-by-persistence approach in compare to
>> the one based on tuples? Specifically:
>> - will the persistence approach work for both 3rd party and Ignite
>> persistence?
>> - any performance impacts depending on a chosen method?
>> - what’s faster to implement?
>> 
>> —
>> Denis
>> 
>>> On Dec 13, 2017, at 2:10 AM, Vladimir Ozerov <voze...@gridgain.com>
>> wrote:
>>> 
>>> Igniters,
>>> 
>>> As you probably we know we work actively on MVCC [1] and transactional
>> SQL
>>> [2] features which could be treated as a single huge improvement. We
>> face a
>>> number of challenges and one of them is locking.
>>> 
>>> At the moment information about all locks is kept in memory on per-entry
>>> basis (see GridCacheMvccManager). For every locked key we maintain
>> current
>>> lock owner (XID) and the list of would-be-owner transactions. When
>>> transaction is about to lock an entry two scenarios are possible:
>>> 1) If entry is not locked we obtain the lock immediately
>>> 2) if entry is locked we add current transaction to the wait list and
>> jumps
>>> to the next entry to be locked. Once the first entry is released by
>>> conflicting transaction, current transaction becomes an owner of the
>> first
>>> entry and tries to promote itself for subsequent entries.
>>> 
>>> Once all required locks are obtained, response is sent to the caller.
>>> 
>>> This approach doesn't work well for transactional SQL - if we update
>>> millions of rows in a single transaction we will simply run out of
>> memory.
>>> To mitigate the problem other database vendors keep information about
>> locks
>>> inside the tuples. I propose to apply the similar design as follows:
>>> 
>>> 1) No per-entry lock information is stored in memory anymore.
>>> 2) The list of active transactions are maintained in memory still
>>> 3) When TX locks an entry, it sets special marker to the tuple [3]
>>> 4) When TX meets already locked entry, it enlists itself to wait queue of
>>> conflicting transaction and suspends
>>> 5) When first transaction releases conflicting lock, it notifies and
>> wakes
>>> up suspended transactions, so they resume locking
>>> 6) Entry lock data is cleared on transaction commit
>>> 7) Entry lock data is not cleared on rollback or node restart; Instead,
>> we
>>> will could use active transactions list to identify invalid locks and
>>> overwrite them as needed.
>>> 
>>> Also we could try employing tiered approach
>>> 1) Try to keep everything in-memory to minimize writes to blocks
>>> 2) Fallback to persistent lock data if certain threshold is reached.
>>> 
>>> Thoughts?
>>> 
>>> [1] https://issues.apache.org/jira/browse/IGNITE-3478
>>> [2] https://issues.apache.org/jira/browse/IGNITE-4191
>>> [3] Depends on final MVCC design - it could be per-tuple XID, undo
>> vectors,
>>> per-block transaction lists, etc..
>>> 
>>> Vladimir.
>> 
>> 

Reply via email to