SuperServer could hung when changing physical backup state under high load
--------------------------------------------------------------------------
Key: CORE-5613
URL: http://tracker.firebirdsql.org/browse/CORE-5613
Project: Firebird Core
Issue Type: Bug
Components: Engine
Affects Versions: 4.0 Alpha 1, 3.0.2, 3.0.1, 3.0.0
Reporter: Vlad Khorsun
The issue was detected when testing nbackup during TPCC run with 64 concurrent
connections.
Engine could hung immediately after begin\end backup, i.e. after physical state
change.
Few threads waits infinitely in RWLock::beginRead() for
BackupManager::localStateLock.
Wait can't succeed as there is no owner of localStateLock.
Also, lock value is -1 which should never happens.
All other threads waits for bdb latches already acquired by threads above.
The problem happens because of race condition:
- backup thread acquires localStateLock in Write mode (see
BackupManager::StateWriteGuard) and set TDBB_backup_write_locked flag (see
BackupManager::lockStateWrite),
then it marks header page and set BDB_nbak_state_lock flag on its BufferDesc
note, this mark does not acquire localStateLock in Read mode because of
BDB_nbak_state_lock (see CCH\set_diff_page() and BackupManager::lockStateRead)
then backup thread release header page (it does not release localStateLock)
- another thread commits and flush dirty pages, it writes dirty header page and
release localStateLock (see CCH\clear_dirty_flag_and_nbak_state)
as BufferDesc have BDB_nbak_state_lock flag set and tdbb is not marked with
TDBB_backup_write_locked flag
- backup thread release localStateLock in Write mode (see ~StateWriteGuard)
I.e. we have excess RWLock::endRead call which broke lock state and leads to
the hangup.
To make problem happens there should be very short transactions to fit (from
start to finish) into small time window
between release of header page and localStateLock by backup thread.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://tracker.firebirdsql.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel