[jira] Updated: (DERBY-2991) Index split deadlock

Knut Anders Hatlen (JIRA) Thu, 26 Feb 2009 05:23:29 -0800

     [ 
https://issues.apache.org/jira/browse/DERBY-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Knut Anders Hatlen updated DERBY-2991:
--------------------------------------

    Attachment: d2991-2a.stat
                d2991-2a.diff

I'm attaching a new patch (d2991-2a.diff) which has these changes
compared to the preview-1e patch:

- added more comments and removed mentioning of scan lock in (some of
  the) existing comments

- removed the savedLockedPos flag from BTreeRowPosition and moved back
  to the original approach of just using current_rh != null to
  indicate that the row on the current position was locked. This also
  allows cheap repositioning by record id in more cases than before
  because current_rh is not null as often as it was with the
  preview-1e patch

- fixed the bug exposed by testBTreeMaxScan_fetchMaxRowFromBeginning
  in IndexSplitDeadlockTest. The problem was that one of the methods
  was passed the scan_position instance variable as an argument. When
  that method called reopenScan(), the instance variable would be
  replaced, but the local variable would remain the same, and this
  inconsistency triggered an assert in the added code. Solved by
  removing the position from the argument list and instead using the
  instance variable directly

- fixed the bug exposed by
  testBTreeForwardScan_fetchRows_resumeScanAfterCompress. Solved by
  fetching the page with another method that didn't fail if the page
  had disappeared, and reposition from the root of the B-tree if the
  page was removed by SYSCS_COMPRESS_TABLE.

- updated canon for store/updatelocksJDBC30.sql which has been added
  to derbyall recently

I have just realized that there are some more problems that need to be
resolved, but I think they will have just minor effect on the patch,
so I'm posting it anyway to allow others to take a look at it and see
if there are more serious problems with it.

Here's a description of the changes in each file touched by the patch
(excluding the test files which were just updated so that they didn't
expect scan locks in the lock tables, and so that they didn't expect
index split deadlocks):

* impl/store/access/sort/Scan.java

Removed savePosition() method from the interface because the position
is now always saved by key when the scan doesn't hold a latch on the
leaf.

* impl/store/access/btree/LeafControlRow.java

Removed calls to Scan.saveScanPositions() and
BTreeLockingPolicy.lockScan() before splitting the page, because
positions are always saved by the scan itself now, and because we
don't use scan locks anymore.

Set a hint in the page object after splitting to notify the scans that
they must reposition from the root of the B-tree after a split.

* impl/store/access/btree/BTreeController.java

Don't lock scan before purging committed deletes. Set a hint in the
page object to tell the scans that they must reposition from the root
of the B-tree since the row may not be there anymore.

* impl/store/access/btree/BTreeScan.java

Remove locking/unlocking of scan.

Update savePosition() to allow the position to be saved by record id
and by key at the same time, and make it possible to pass in a partial
or a full key to reduce the number of slots that must be fetched from
the page.

Update reposition() to reposition by record id if possible and by key
if needed. Repositioning by record id (which is cheaper) is possible
if the row is guaranteed to be on the same page as when the position
was saved (that is, no split or purge operation has been performed on
the page after saving the position).

Use savePositionAndReleasePage() when returning from the scan in
delete(), fetch(), doesCurrentPositionQualify() and
isCurrentPositionDeleted().

* impl/store/access/btree/BTreeMaxScan.java

Remove the BTreeRowPosition argument from fetchMaxRowFromBeginning()
to prevent that it goes out of sync with the instance variable
scan_position when reopenScan() is called. Use the fresh instance
variable instead.

Remove references to the scan protection handle.

* impl/store/access/btree/BTreeRowPosition.java

Add more state (and methods to access the state):

  - parent of the position. That is, which scan owns this
    position. This was needed to allow the position to be saved from
    some methods that didn't know which scan it belonged to. Could
    alternatively be solved by passing the scan as an argument to
    those methods (or actually to those methods and their callers)
    which is probably cleaner, but could be performed in a later
    clean-up

  - version number of the current leaf page when the position was
    saved (used to determine whether full repositioning is needed
    because of split, etc.)

  - template for the key to save (to prevent allocation each time the
    key is saved). When the position is saved by key, this is
    identical to current_positionKey. A separate field was added so
    that we keep the object even when current_positionKey is nulled
    out, but a possibly cleaner solution would be to have just a flag
    telling whether the value in current_positionKey is valid, and
    never reset current_positionKey to null. Could be done in a later
    clean-up

  - fetch descriptor used to fetch the rest of the key in the cases
    where the scan has already fetched parts of the key before saving
    the position

* impl/store/access/btree/BTreePostCommit.java

purgeRowLevelCommittedDeletes() sets a hint in the page object to
force scans to reposition from the root of the B-tree when at least
one row has been purged from the page.

I think the same change should have been made in
purgeCommittedDeletes(). I missed it because the method assumed an
exclusive table lock and therefore didn't need a scan lock. Will
update the patch later.

* impl/store/access/btree/OpenBTree.java

Remove references to the removed saveScanPositions() method and to the
protection record handle.

Make the debug code that simulates release of latches save the
position since that's what happens if the latches really are released
by the production code now.

* impl/store/access/btree/index/B2IRowLockingRR.java
* impl/store/access/btree/index/B2INoLocking.java
* impl/store/access/btree/index/B2IRowLocking1.java
* impl/store/access/btree/index/B2IRowLocking3.java
* impl/store/access/btree/BTreeLockingPolicy.java

Remove request_scan_lock parameter.

Remove code to lock/unlock scan.

Save position of scan if lock cannot be granted immediately.

* impl/store/access/btree/BTreeForwardScan.java

Save position by key each time the scan returns a group of rows. Use
the partial (possibly full) key fetched by the scan to make the save
position operation cheaper.

* impl/store/access/RAMTransaction.java
* impl/store/access/heap/HeapScan.java
* iapi/store/access/conglomerate/ScanManager.java
* iapi/store/access/conglomerate/TransactionManager.java

Remove saveScanPositions()/savePosition() because the position will
already have been saved now since we always save the position when we
release the latch.

* impl/store/access/heap/HeapRowLocation.java

Remove THROWASSERT from and complete implementation of
setFrom(DataValueDescriptor) to allow the RowLocation in the index row
to be copied when we save the position.

* impl/store/raw/data/BasePage.java
* iapi/store/raw/Page.java

Remove reference to protection record handle.

Add code to set the hint that repositioning from the root of the
B-tree is needed.

* iapi/store/raw/RecordHandle.java

Remove constant identifying the record protection handle that we no
longer use.


> Index split deadlock
> --------------------
>
>                 Key: DERBY-2991
>                 URL: https://issues.apache.org/jira/browse/DERBY-2991
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.2.2.0, 10.3.1.4
>         Environment: Windows XP, Java 6
>            Reporter: Bogdan Calmac
>            Assignee: Knut Anders Hatlen
>         Attachments: d2991-2a.diff, d2991-2a.stat, d2991-preview-1a.diff, 
> d2991-preview-1a.stat, d2991-preview-1b.diff, d2991-preview-1b.stat, 
> d2991-preview-1c.diff, d2991-preview-1c.stat, d2991-preview-1d.diff, 
> d2991-preview-1d.stat, d2991-preview-1e.diff, derby.log, 
> InsertSelectDeadlock.java, perftest.diff, Repro2991.java, 
> stacktraces_during_deadlock.txt, test-1.diff, test-2.diff, test-3.diff
>
>
> After doing dome research on the mailing list, it appears that the index 
> split deadlock is a known behaviour, so I will start by describing the 
> theoretical problem first and then follow with the details of my test case.
> If you have concurrent select and insert transactions on the same table, the 
> observed locking behaviour is as follows:
>  - the select transaction acquires an S lock on the root block of the index 
> and then waits for an S lock on some uncommitted row of the insert transaction
>  - the insert transaction acquires X locks on the inserted records and if it 
> needs to do an index split creates a sub-transaction that tries to acquire an 
> X lock on the root block of the index
> In summary: INDEX LOCK followed by ROW LOCK + ROW LOCK followed by INDEX LOCK 
> = deadlock
> In the case of my project this is an important issue (lack of concurrency 
> after being forced to use table level locking) and I would like to contribute 
> to the project and fix this issue (if possible). I was wondering if someone 
> that knows the code can give me a few pointers on the implications of this 
> issue:
>  - Is this a limitation of the top-down algorithm used?
>  - Would fixing it require to use a bottom up algorithm for better 
> concurrency (which is certainly non trivial)?
>  - Trying to break the circular locking above, I would first question why 
> does the select transaction need to acquire (and hold) a lock on the root 
> block of the index. Would it be possible to ensure the consistency of the 
> select without locking the index?
> -----
> The attached test (InsertSelectDeadlock.java) tries to simulate a typical 
> data collection application, it consists of: 
>  - an insert thread that inserts records in batch 
>  - a select thread that 'processes' the records inserted by the other thread: 
> 'select * from table where id > ?' 
> The derby log provides detail about the deadlock trace and 
> stacktraces_during_deadlock.txt shows that the inser thread is doing an index 
> split.
> The test was run on 10.2.2.0 and 10.3.1.4 with identical behaviour.
> Thanks,
> Bogdan Calmac.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-2991) Index split deadlock

Reply via email to