I think we are all aligned on the purpose. It's to avoid enabling the interrupts in the end of RestoreTPL (HIGH->non-HIGH) in the interrupt context. The discussion is about how to implement it.
Michael Brown's idea is to avoid changing DxeCore but add a customized RaiseTpl/RestoreTpl implementation in a lib and request Timer driver calls it. That lib was implemented very smartly. It includes while-loop, implicitly-recursive, implicitly-requiring NESTED_INTERRUPT_STATE in global storage not in stack as local variable. I really do NOT like the future that every timer driver calls that lib to avoid the potential stack overflow. It's so complicated! And it's called in every 10ms!! Paolo, I don't fully understand your patch especially the following changes. 3 comments embedded. @@ -161,5 +191,46 @@ CoreRestoreTpl ( IN EFI_TPL NewTpl ) { + BOOLEAN InInterruptHandler = FALSE; + + // + // Unwind the nested interrupt handlers up to the required + // TPL, paying attention not to overflow the stack. While + // not strictly necessary according to the specification, + // accept the possibility that multiple RaiseTPL calls are + // undone by a single RestoreTPL + // + while ((INTN)NewTpl <= HighBitSet64 (mInterruptedTplMask)) { 1. why "<="? I thought when RestoreTPL() is called there are only two cases: a. NewTpl == HighBitSet64 (...) b. NewTpl > HighBitSet64 (...) 1.a is the case when TimerInterruptHandler() or CoreTimerTick() restores TPL from HIGH to non-HIGH. 1.b is the case when the pending event backs call RaiseTPL/RestoreTPL(). Because only pending events whose TPL > "Interrupted TPL" can run, the RestoreTPL() call from the event callbacks cannot change the TPL to a value less than or equal to "Interrupted TPL". So, I think "<=" can be "==". 2. can you explain a bit more about the reason of "while"? + UINTN InterruptedTpl = HighBitSet64 (mInterruptedTplMask); + mInterruptedTplMask &= ~(UINTN)(1 << InterruptedTpl); + + ASSERT (GetInterruptState () == FALSE); + InInterruptHandler = TRUE; + + // + // Take the TPL down a notch to allow event notifications to be + // dispatched. This will implicitly re-enable interrupts (if + // InterruptedTPL is below TPL_HIGH_LEVEL), even though we are + // still inside the interrupt handler, but the new TPL will + // be set while they are disabled. + // + // DesiredInterruptState must be FALSE to ensure that the + // stack does not blow up. If we used, as in the final call + // below, "InterruptedTpl < TPL_HIGH_LEVEL", the timer interrupt + // handler could be invoked repeatedly in the small window between + // CoreSetInterruptState (TRUE) and the IRET instruction. + // + CoreRestoreTplInternal (InterruptedTpl, FALSE); + + if (InterruptedTpl == NewTpl) { + break; 3. "break" or "return"? I think we should exit from this function. + } + } + + // + // If we get here with InInterruptHandler == TRUE, an interrupt + // handler forgot to restore the TPL. + // + ASSERT (!InInterruptHandler); CoreRestoreTplInternal (NewTpl, NewTpl < TPL_HIGH_LEVEL); } Thanks, Ray > -----Original Message----- > From: Paolo Bonzini <pbonz...@redhat.com> > Sent: Friday, March 1, 2024 8:14 AM > To: Ni, Ray <ray...@intel.com> > Cc: devel@edk2.groups.io; Kinney, Michael D <michael.d.kin...@intel.com>; > Liming Gao <gaolim...@byosoft.com.cn>; Laszlo Ersek <ler...@redhat.com>; > Michael Brown <mc...@ipxe.org> > Subject: Re: [PATCH 2/2] MdeModulePkg/DxeCore: Fix stack overflow issue > due to nested interrupts > > On Thu, Feb 29, 2024 at 2:04 PM Ray Ni <ray...@intel.com> wrote: > > @@ -134,9 +262,9 @@ CoreRestoreTpl ( > > } > > > > // > > - // Set the new value > > + // Set the new TPL with interrupt disabled. > > // > > - > > + CoreSetInterruptState (FALSE); > > gEfiCurrentTpl = NewTpl; > > > > // > > @@ -144,7 +272,22 @@ CoreRestoreTpl ( > > // interrupts are enabled > > // > > if (gEfiCurrentTpl < TPL_HIGH_LEVEL) { > > - CoreSetInterruptState (TRUE); > > + if ((INTN)gEfiCurrentTpl > HighBitSet64 (mInterruptedTplMask)) { > > + // > > + // Only enable interrupts if restoring to a level above the highest > > + // interrupted TPL level. This allows interrupt nesting, but only > > for > > + // events at higher TPL level than the current TPL level. > > + // > > + CoreSetInterruptState (TRUE); > > + } else { > > + // > > + // Clear interrupted TPL level mask, but do not re-enable interrupts > here > > + // This will return to CoreTimerTick() and interrupts will be > re-enabled > > + // when the timer interrupt handlers returns from interrupt > context. > > + // > > + ASSERT ((INTN)gEfiCurrentTpl == HighBitSet64 > (mInterruptedTplMask)); > > + mInterruptedTplMask &= ~(UINTN)(1 << gEfiCurrentTpl); > > + } > > } > > Ok, now I understand what's going on and it's indeed the same logic as > NestedInterruptTplLib, with DisableInterruptsOnIret() replaced by > skipping CoreSetInterruptState(TRUE). It's similar to what I proposed > elsewhere in the thread, just written differenty. > > I agree with Michael Brown that the spec is unclear on the state of > the interrupt flag on exit from gBS->RestoreTPL(), but perhaps this > change is feasible if the interrupt handlers just raise the TPL first > and restore it last. > > Just as an exercise for me to understand the code better, I tried > rewriting the code in terms of the CoreRestoreTplInternal() function > that I proposed. I find it easier to read, but I guess that's a bit in > the eye of the beholder, and it is a little more defensively coded. I > attach it (untested beyond compilation) for reference. > > Paolo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#116207): https://edk2.groups.io/g/devel/message/116207 Mute This Topic: https://groups.io/mt/104642317/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-