Hi Peter.

On 6/2/26 1:48 PM, Peter Zijlstra wrote:
On Tue, Jun 02, 2026 at 01:26:48PM +0530, Shrikanth Hegde wrote:


On 6/1/26 3:26 PM, Peter Zijlstra wrote:
On Mon, Jun 01, 2026 at 02:46:24PM +0530, Shrikanth Hegde wrote:

Ritesh, Mukesh, Is below possible scenario?

do_page_fault seems to enable irq's in the interrupt handler?
is that expected? if so, one might see

-- do_page_fault (enter kernel mode)
     -- enables interrupts
     -- gets interrupt - Sets need_resched.
        -- irqentry_exit - Sees it is kernel mode. Just checks preempt count
                         and calls preempt_schedule_irq, which catches both
                         preempt_count and !irqs_disabled. Hence the panic?

Should do_page_fault do preempt_disable when it enables the interrupts?

No, it is expected for page-fault to be able to schedule. Specifically,
it must be able to sleep to support loading pages from disk.

Oh yes. Ok. Thanks for taking a look.


Please check the value of preempt_count() (does it perchance have
HARDIRQ_OFFSET?). Also, if the fault handler does enable IRQs, it must
also disable them again once done.

Will check it.


Notably, I see ___do_page_fault() do interrupt_cond_loadl_irq_enable(),
but I'm not seeing a local_irq_disable() to match!

Yes, that's likely the culprit. It is possible that ___do_page_fault runs for 
longer
and it may set need_resched. If it was in kernel mode, then it may not disable 
the
interrupt and then subsequent irqentry_exit panics.

BTW I was able to consistently repro this on P9 with hackbench as below.

for i in {0..10}; do ./hackbench 10 process 10000 loops; done;
for i in {0..10}; do ./hackbench 20 process 10000 loops; done;
for i in {0..10}; do ./hackbench 30 process 10000 loops; done;
for i in {0..10}; do ./hackbench 40 process 10000 loops; done;    << usually 
panics here.
for i in {0..10}; do ./hackbench 10 thread 10000 loops; done;
for i in {0..10}; do ./hackbench 20 thread 10000 loops; done;
for i in {0..10}; do ./hackbench -pipe 10 process 10000 loops; done;
for i in {0..10}; do ./hackbench -pipe 20 process 10000 loops; done;
for i in {0..10}; do ./hackbench -pipe 30 process 10000 loops; done;
for i in {0..10}; do ./hackbench -pipe 40 process 10000 loops; done;
for i in {0..10}; do ./hackbench -pipe 10 thread 10000 loops; done;
for i in {0..10}; do ./hackbench -pipe 20 thread 10000 loops; done;

Note, if i run ./hackbench 40 process 10000 loops alone, it doesn't panic.
Likely some continous stressing needed to get into this case.

Below diff helps to fix it. With it see test passes. Hackbench numbers aren't 
super happy
about it. It is regressing a bit compared to baseline. But no panic atleast.
AND i have changed the BUG_ON to WARN_ON as irq_disabled right after. We could 
still fix the
call sites if the warning is seen.

diff --git a/arch/powerpc/include/asm/entry-common.h 
b/arch/powerpc/include/asm/entry-common.h
index de5601282755..7da373a56813 100644
--- a/arch/powerpc/include/asm/entry-common.h
+++ b/arch/powerpc/include/asm/entry-common.h
@@ -253,16 +253,17 @@ static inline void arch_interrupt_enter_prepare(struct 
pt_regs *regs)
  static inline void arch_interrupt_exit_prepare(struct pt_regs *regs)
  {
         if (user_mode(regs)) {
-               BUG_ON(regs_is_unrecoverable(regs));
-               BUG_ON(regs_irqs_disabled(regs));
+               WARN_ON(regs_is_unrecoverable(regs));
+               WARN_ON(regs_irqs_disabled(regs));
                 /*
                  * We don't need to restore AMR on the way back to userspace 
for KUAP.
                  * AMR can only have been unlocked if we interrupted the 
kernel.
                  */
                 kuap_assert_locked();
-
-               local_irq_disable();
         }
+
+       /* irqentry_exit expects to be called with interrupts disabled */
+       local_irq_disable();
  }
  static inline void arch_interrupt_async_enter_prepare(struct pt_regs *regs)


I would suggest trying something a little more focussed like so:

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 806c74e0d5ab..b002c179415c 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -589,6 +589,7 @@ static __always_inline void __do_page_fault(struct pt_regs 
*regs)
        err = ___do_page_fault(regs, regs->dar, regs->dsisr);
        if (unlikely(err))
                bad_page_fault(regs, err);
+       local_irq_disable();
  }
DEFINE_INTERRUPT_HANDLER(do_page_fault)

Since only ___do_page_fault() will enable interrupts, you only need to
disable them again on its return path.


Seems there are more...

do_program_check (called by program_check_exception, emulation_assist_interrupt)
alignment_exception
SPEFloatingPointException
facility_unavailable_exception


Many looks like it can recover only if hit in userspace.
Hence i though it would make sense to put it under arch_interrupt_exit_prepare
which is called just before irqentry_exit.

Reply via email to