[
https://issues.apache.org/jira/browse/DERBY-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Knut Anders Hatlen updated DERBY-2991:
--------------------------------------
Attachment: d2991-2a.stat
d2991-2a.diff
I'm attaching a new patch (d2991-2a.diff) which has these changes
compared to the preview-1e patch:
- added more comments and removed mentioning of scan lock in (some of
the) existing comments
- removed the savedLockedPos flag from BTreeRowPosition and moved back
to the original approach of just using current_rh != null to
indicate that the row on the current position was locked. This also
allows cheap repositioning by record id in more cases than before
because current_rh is not null as often as it was with the
preview-1e patch
- fixed the bug exposed by testBTreeMaxScan_fetchMaxRowFromBeginning
in IndexSplitDeadlockTest. The problem was that one of the methods
was passed the scan_position instance variable as an argument. When
that method called reopenScan(), the instance variable would be
replaced, but the local variable would remain the same, and this
inconsistency triggered an assert in the added code. Solved by
removing the position from the argument list and instead using the
instance variable directly
- fixed the bug exposed by
testBTreeForwardScan_fetchRows_resumeScanAfterCompress. Solved by
fetching the page with another method that didn't fail if the page
had disappeared, and reposition from the root of the B-tree if the
page was removed by SYSCS_COMPRESS_TABLE.
- updated canon for store/updatelocksJDBC30.sql which has been added
to derbyall recently
I have just realized that there are some more problems that need to be
resolved, but I think they will have just minor effect on the patch,
so I'm posting it anyway to allow others to take a look at it and see
if there are more serious problems with it.
Here's a description of the changes in each file touched by the patch
(excluding the test files which were just updated so that they didn't
expect scan locks in the lock tables, and so that they didn't expect
index split deadlocks):
* impl/store/access/sort/Scan.java
Removed savePosition() method from the interface because the position
is now always saved by key when the scan doesn't hold a latch on the
leaf.
* impl/store/access/btree/LeafControlRow.java
Removed calls to Scan.saveScanPositions() and
BTreeLockingPolicy.lockScan() before splitting the page, because
positions are always saved by the scan itself now, and because we
don't use scan locks anymore.
Set a hint in the page object after splitting to notify the scans that
they must reposition from the root of the B-tree after a split.
* impl/store/access/btree/BTreeController.java
Don't lock scan before purging committed deletes. Set a hint in the
page object to tell the scans that they must reposition from the root
of the B-tree since the row may not be there anymore.
* impl/store/access/btree/BTreeScan.java
Remove locking/unlocking of scan.
Update savePosition() to allow the position to be saved by record id
and by key at the same time, and make it possible to pass in a partial
or a full key to reduce the number of slots that must be fetched from
the page.
Update reposition() to reposition by record id if possible and by key
if needed. Repositioning by record id (which is cheaper) is possible
if the row is guaranteed to be on the same page as when the position
was saved (that is, no split or purge operation has been performed on
the page after saving the position).
Use savePositionAndReleasePage() when returning from the scan in
delete(), fetch(), doesCurrentPositionQualify() and
isCurrentPositionDeleted().
* impl/store/access/btree/BTreeMaxScan.java
Remove the BTreeRowPosition argument from fetchMaxRowFromBeginning()
to prevent that it goes out of sync with the instance variable
scan_position when reopenScan() is called. Use the fresh instance
variable instead.
Remove references to the scan protection handle.
* impl/store/access/btree/BTreeRowPosition.java
Add more state (and methods to access the state):
- parent of the position. That is, which scan owns this
position. This was needed to allow the position to be saved from
some methods that didn't know which scan it belonged to. Could
alternatively be solved by passing the scan as an argument to
those methods (or actually to those methods and their callers)
which is probably cleaner, but could be performed in a later
clean-up
- version number of the current leaf page when the position was
saved (used to determine whether full repositioning is needed
because of split, etc.)
- template for the key to save (to prevent allocation each time the
key is saved). When the position is saved by key, this is
identical to current_positionKey. A separate field was added so
that we keep the object even when current_positionKey is nulled
out, but a possibly cleaner solution would be to have just a flag
telling whether the value in current_positionKey is valid, and
never reset current_positionKey to null. Could be done in a later
clean-up
- fetch descriptor used to fetch the rest of the key in the cases
where the scan has already fetched parts of the key before saving
the position
* impl/store/access/btree/BTreePostCommit.java
purgeRowLevelCommittedDeletes() sets a hint in the page object to
force scans to reposition from the root of the B-tree when at least
one row has been purged from the page.
I think the same change should have been made in
purgeCommittedDeletes(). I missed it because the method assumed an
exclusive table lock and therefore didn't need a scan lock. Will
update the patch later.
* impl/store/access/btree/OpenBTree.java
Remove references to the removed saveScanPositions() method and to the
protection record handle.
Make the debug code that simulates release of latches save the
position since that's what happens if the latches really are released
by the production code now.
* impl/store/access/btree/index/B2IRowLockingRR.java
* impl/store/access/btree/index/B2INoLocking.java
* impl/store/access/btree/index/B2IRowLocking1.java
* impl/store/access/btree/index/B2IRowLocking3.java
* impl/store/access/btree/BTreeLockingPolicy.java
Remove request_scan_lock parameter.
Remove code to lock/unlock scan.
Save position of scan if lock cannot be granted immediately.
* impl/store/access/btree/BTreeForwardScan.java
Save position by key each time the scan returns a group of rows. Use
the partial (possibly full) key fetched by the scan to make the save
position operation cheaper.
* impl/store/access/RAMTransaction.java
* impl/store/access/heap/HeapScan.java
* iapi/store/access/conglomerate/ScanManager.java
* iapi/store/access/conglomerate/TransactionManager.java
Remove saveScanPositions()/savePosition() because the position will
already have been saved now since we always save the position when we
release the latch.
* impl/store/access/heap/HeapRowLocation.java
Remove THROWASSERT from and complete implementation of
setFrom(DataValueDescriptor) to allow the RowLocation in the index row
to be copied when we save the position.
* impl/store/raw/data/BasePage.java
* iapi/store/raw/Page.java
Remove reference to protection record handle.
Add code to set the hint that repositioning from the root of the
B-tree is needed.
* iapi/store/raw/RecordHandle.java
Remove constant identifying the record protection handle that we no
longer use.
> Index split deadlock
> --------------------
>
> Key: DERBY-2991
> URL: https://issues.apache.org/jira/browse/DERBY-2991
> Project: Derby
> Issue Type: Bug
> Components: Store
> Affects Versions: 10.2.2.0, 10.3.1.4
> Environment: Windows XP, Java 6
> Reporter: Bogdan Calmac
> Assignee: Knut Anders Hatlen
> Attachments: d2991-2a.diff, d2991-2a.stat, d2991-preview-1a.diff,
> d2991-preview-1a.stat, d2991-preview-1b.diff, d2991-preview-1b.stat,
> d2991-preview-1c.diff, d2991-preview-1c.stat, d2991-preview-1d.diff,
> d2991-preview-1d.stat, d2991-preview-1e.diff, derby.log,
> InsertSelectDeadlock.java, perftest.diff, Repro2991.java,
> stacktraces_during_deadlock.txt, test-1.diff, test-2.diff, test-3.diff
>
>
> After doing dome research on the mailing list, it appears that the index
> split deadlock is a known behaviour, so I will start by describing the
> theoretical problem first and then follow with the details of my test case.
> If you have concurrent select and insert transactions on the same table, the
> observed locking behaviour is as follows:
> - the select transaction acquires an S lock on the root block of the index
> and then waits for an S lock on some uncommitted row of the insert transaction
> - the insert transaction acquires X locks on the inserted records and if it
> needs to do an index split creates a sub-transaction that tries to acquire an
> X lock on the root block of the index
> In summary: INDEX LOCK followed by ROW LOCK + ROW LOCK followed by INDEX LOCK
> = deadlock
> In the case of my project this is an important issue (lack of concurrency
> after being forced to use table level locking) and I would like to contribute
> to the project and fix this issue (if possible). I was wondering if someone
> that knows the code can give me a few pointers on the implications of this
> issue:
> - Is this a limitation of the top-down algorithm used?
> - Would fixing it require to use a bottom up algorithm for better
> concurrency (which is certainly non trivial)?
> - Trying to break the circular locking above, I would first question why
> does the select transaction need to acquire (and hold) a lock on the root
> block of the index. Would it be possible to ensure the consistency of the
> select without locking the index?
> -----
> The attached test (InsertSelectDeadlock.java) tries to simulate a typical
> data collection application, it consists of:
> - an insert thread that inserts records in batch
> - a select thread that 'processes' the records inserted by the other thread:
> 'select * from table where id > ?'
> The derby log provides detail about the deadlock trace and
> stacktraces_during_deadlock.txt shows that the inser thread is doing an index
> split.
> The test was run on 10.2.2.0 and 10.3.1.4 with identical behaviour.
> Thanks,
> Bogdan Calmac.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.