On Wed, 10 Jan 2024 at 02:44, Andy Fan <zhihuifan1...@163.com> wrote: > Hi, > > I want to know if Andres or you have plan > to do some code review. I don't expect this would happen very soon, just > want to make sure this will not happen that both of you think the other > one will do, but actually none of them does it in fact. a commit fest > [1] has been added for this.
> +++ b/src/backend/storage/buffer/bufmgr.c > @@ -5419,6 +5419,7 @@ LockBufHdr(BufferDesc *desc) > perform_spin_delay(&delayStatus); > } > finish_spin_delay(&delayStatus); > + START_SPIN_LOCK(); > return old_buf_state | BM_LOCKED; > } I think that we need to 'arm' the checks just before we lock the spin lock, and 'disarm' the checks just after we unlock the spin lock, rather than after and before, respectively. That way, we won't have a chance of false negatives: with your current patch it is possible that an interrupt fires between the acquisition of the lock and the code in START_SPIN_LOCK() marking the thread as holding a spin lock, which would cause any check in that signal handler to incorrectly read that we don't hold any spin locks. > +++ b/src/backend/storage/lmgr/lock.c > @@ -776,6 +776,8 @@ LockAcquireExtended(const LOCKTAG *locktag, > bool found_conflict; > bool log_lock = false; > > + Assert(SpinLockCount == 0); > + I'm not 100% sure on the policy of this, but theoretically you could use LockAquireExtended(dontWait=true) while holding a spin lock, as that would not have an unknown duration. Then again, this function also does elog/ereport, which would cause issues, still, so this code may be the better option. > + elog(PANIC, "stuck spinlock detected at %s, %s:%d after waiting for %u > ms", > + func, file, line, delay_ms); pg_usleep doesn't actually guarantee that we'll wait for exactly that duration; depending on signals received while spinning and/or OS scheduling decisions it may be off by orders of magnitude. > +++ b/src/common/scram-common.c This is unrelated to the main patchset. > +++ b/src/include/storage/spin.h Minor: I think these changes could better be included in miscadmin, or at least the definition for SpinLockCount should be moved there: The spin lock system itself shouldn't be needed in places where we need to make sure that we don't hold any spinlocks, and miscadmin.h already holds things related to "System interrupt and critical section handling", which seems quite related. Kind regards, Matthias van de Meent