On Tue, Dec 01, 2020 at 08:14:41PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 01, 2020 at 06:57:37PM +0000, Mark Rutland wrote:
> > On Tue, Dec 01, 2020 at 07:15:06PM +0100, Peter Zijlstra wrote:
> > > On Tue, Dec 01, 2020 at 03:55:19PM +0100, Peter Zijlstra wrote:
> > > > On Tue, Dec 01, 2020 at 06:46:44AM -0800, Paul E. McKenney wrote:
> > > > 
> > > > > > So after having talked to Sven a bit, the thing that is happening, 
> > > > > > is
> > > > > > that this is the one place where we take interrupts with RCU being
> > > > > > disabled. Normally RCU is watching and all is well, except during 
> > > > > > idle.
> > > > > 
> > > > > Isn't interrupt entry supposed to invoke rcu_irq_enter() at some 
> > > > > point?
> > > > > Or did this fall victim to recent optimizations?
> > > > 
> > > > It does, but the problem is that s390 is still using
> > > 
> > > I might've been too quick there, I can't actually seem to find where
> > > s390 does rcu_irq_enter()/exit().
> > > 
> > > Also, I'm thinking the below might just about solve the current problem.
> > > The next problem would then be it calling TRACE_IRQS_ON after it did
> > > rcu_irq_exit()... :/
> > 
> > I gave this patch a go under QEMU TCG atop v5.10-rc6 s390 defconfig with
> > PROVE_LOCKING and DEBUG_ATOMIC_SLEEP. It significantly reduces the
> > number of lockdep splats, but IIUC we need to handle the io_int_handler
> > path in addition to the ext_int_handler path, and there's a remaining
> > lockdep splat (below).
> 
> I'm amazed it didn't actually make things worse, given how I failed to
> spot do_IRQ() was arch code etc..
> 
> > If this ends up looking like we'll need more point-fixes, I wonder if we
> > should conditionalise the new behaviour of the core idle code under a
> > new CONFIG symbol for now, and opt-in x86 and arm64, then transition the
> > rest once they've had a chance to test. They'll still be broken in the
> > mean time, but no more so than they previously were.
> 
> We can do that I suppose... :/

Well, the following small patch works for me (plus an additional call to
trace_hardirqs_on() in our udelay implementation - but that's probably
independent).
Is there a reason why this should be considered broken?

diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index 26bb0603c5a1..92beb1444644 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -763,12 +763,7 @@ ENTRY(io_int_handler)
        xc      __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
        TSTMSK  __LC_CPU_FLAGS,_CIF_IGNORE_IRQ
        jo      .Lio_restore
-#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
-       tmhh    %r8,0x300
-       jz      1f
        TRACE_IRQS_OFF
-1:
-#endif
        xc      __SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
 .Lio_loop:
        lgr     %r2,%r11                # pass pointer to pt_regs
@@ -791,12 +786,7 @@ ENTRY(io_int_handler)
        TSTMSK  __LC_CPU_FLAGS,_CIF_WORK
        jnz     .Lio_work
 .Lio_restore:
-#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
-       tm      __PT_PSW(%r11),3
-       jno     0f
        TRACE_IRQS_ON
-0:
-#endif
        mvc     __LC_RETURN_PSW(16),__PT_PSW(%r11)
        tm      __PT_PSW+1(%r11),0x01   # returning to user ?
        jno     .Lio_exit_kernel
@@ -976,12 +966,7 @@ ENTRY(ext_int_handler)
        xc      __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
        TSTMSK  __LC_CPU_FLAGS,_CIF_IGNORE_IRQ
        jo      .Lio_restore
-#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
-       tmhh    %r8,0x300
-       jz      1f
        TRACE_IRQS_OFF
-1:
-#endif
        xc      __SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
        lgr     %r2,%r11                # pass pointer to pt_regs
        lghi    %r3,EXT_INTERRUPT
diff --git a/arch/s390/kernel/idle.c b/arch/s390/kernel/idle.c
index 2b85096964f8..5bd8c1044d09 100644
--- a/arch/s390/kernel/idle.c
+++ b/arch/s390/kernel/idle.c
@@ -123,7 +123,6 @@ void arch_cpu_idle_enter(void)
 void arch_cpu_idle(void)
 {
        enabled_wait();
-       raw_local_irq_enable();
 }
 
 void arch_cpu_idle_exit(void)

Reply via email to