On Fri, Oct 19, 2018 at 04:31:30PM +0200, Thomas Gleixner wrote: > On Thu, 18 Oct 2018, Lukas Wunner wrote: > > Commit 1e77d0a1ed74 ("genirq: Sanitize spurious interrupt detection of > > threaded irqs") made detection of spurious interrupts work for threaded > > handlers by: > > > > a) incrementing a counter every time the thread returns IRQ_HANDLED, and > > b) checking whether that counter has increased every time the thread is > > woken. > > > > However for oneshot interrupts, the commit unmasks the interrupt before > > incrementing the counter. If another interrupt occurs right after > > unmasking but before the counter is incremented, that interrupt is > > incorrectly considered spurious: > > > > time > > | irq_thread() > > | irq_thread_fn() > > | action->thread_fn() > > | irq_finalize_oneshot() > > | unmask_threaded_irq() /* interrupt is unmasked */ > > | > > | /* interrupt fires, incorrectly deemed spurious */ > > | > > | atomic_inc(&desc->threads_handled); /* counter is incremented */ > > v > > > > I am seeing this with a hi3110 CAN controller receiving data at high > > volume (from a separate machine sending with "cangen -g 0 -i -x"): > > The controller signals a huge number of interrupts (hundreds of millions > > per day) and every second there are about a dozen which are deemed > > spurious. The issue is benign in this case, mostly just an irritation, > > but I'm worrying that at high CPU load and in the presence of higher > > priority tasks, the number of incorrectly detected spurious interrupts > > might increase beyond the 99,900 threshold and cause disablement of the > > IRQ. > > I doubt that this can happen in reality, so I'd rather reword that > paragraph slightly: > > In theory high CPU load and in the presence of higher priority tasks, the > number of incorrectly detected spurious interrupts might increase beyond > the 99,900 threshold and cause disablement of the interrupt. > > In practice it just increments the spurious interrupt count. But that can > cause people to waste time investigating it over and over. > > Hmm?
Sure, fine by me. Would you prefer me to resend with that change or can you fold it in when applying? FWIW I did manage to reach the 99,900 threshold once because I had added copious amounts of printk() to the hi3110 IRQ thread to debug another issue. But I never experienced that without those printk()'s. Here's the resulting splat: irq 194: nobody cared (try booting with the "irqpoll" option) CPU: 0 PID: 1929 Comm: candump Tainted: G O 4.9.76-rt60-v7+ #1 Hardware name: BCM2835 [<8011106c>] (unwind_backtrace) from [<8010cdd8>] (show_stack+0x20/0x24) [<8010cdd8>] (show_stack) from [<8047cb2c>] (dump_stack+0xc8/0x10c) [<8047cb2c>] (dump_stack) from [<8018192c>] (__report_bad_irq+0x3c/0xdc) [<8018192c>] (__report_bad_irq) from [<80181d94>] (note_interrupt+0x29c/0x2ec) [<80181d94>] (note_interrupt) from [<8017ec9c>] (handle_irq_event_percpu+0x78/0x84) [<8017ec9c>] (handle_irq_event_percpu) from [<8017ed20>] (handle_irq_event+0x78/0xbc) [<8017ed20>] (handle_irq_event) from [<80182ad8>] (handle_edge_irq+0x13c/0x1e8) [<80182ad8>] (handle_edge_irq) from [<8017db64>] (generic_handle_irq+0x34/0x44) [<8017db64>] (generic_handle_irq) from [<804ae1a0>] (bcm2835_gpio_irq_handle_bank+0x88/0xac) [<804ae1a0>] (bcm2835_gpio_irq_handle_bank) from [<804ae2ac>] (bcm2835_gpio_irq_handler+0xe8/0x154) [<804ae2ac>] (bcm2835_gpio_irq_handler) from [<8017db64>] (generic_handle_irq+0x34/0x44) [<8017db64>] (generic_handle_irq) from [<804a7720>] (bcm2836_chained_handle_irq+0x38/0x50) [<804a7720>] (bcm2836_chained_handle_irq) from [<8017db64>] (generic_handle_irq+0x34/0x44) [<8017db64>] (generic_handle_irq) from [<8017e144>] (__handle_domain_irq+0x6c/0xc4) [<8017e144>] (__handle_domain_irq) from [<8010155c>] (bcm2836_arm_irqchip_handle_irq+0xac/0xb0) [<8010155c>] (bcm2836_arm_irqchip_handle_irq) from [<80775dec>] (__irq_usr+0x4c/0x60) Exception stack(0xb6b15fb0 to 0xb6b15ff8) 5fa0: 76ec1d50 0000000a 011f8026 fbad2aa4 5fc0: 76ec1d50 0000000a 76f1a000 000263bc 00000000 000263fc 00000001 7ec3f470 5fe0: 00000444 7ec3f2e8 76de5b90 76def528 40000010 ffffffff handlers: [<8017edc0>] irq_default_primary_handler threaded [<7f37c734>] hi3110_can_ist [hi311x] Disabling IRQ #194 Thanks, Lukas