On 28.12.2018 02:31, Frederic Weisbecker wrote:
> On Fri, Dec 28, 2018 at 12:11:12AM +0100, Heiner Kallweit wrote:
>>
[...]
> 
> Interesting, the softirq is raised from hardirq but it's not handled in the 
> end of
> the IRQ. Are you running threaded IRQS by any chance? If so I would expect 
> ksoftirqd
> to handle the pending work before we go idle. However I can imagine a small 
> window
> where such an expectation may not be met: if the softirq is raised after the 
> ksoftirqd
> thread is parked (CPUHP_AP_SMPBOOT_THREADS), which is right before we disable 
> the CPU
> (CPUHP_TEARDOWN_CPU).
> 
I have a network driver (r8169) using NAPI which runs in softirq context AFAIK.
For testing purposes I sometimes trigger system suspend via network, so there is
network adapter activity when system suspends. Apart from that nothing really
exciting:
            CPU0       CPU1       CPU2       CPU3
   0:         43          0          0          0   IO-APIC    2-edge      timer
   1:          4          0          0          0   IO-APIC    1-edge      i8042
   8:          0          1          0          0   IO-APIC    8-fasteoi   rtc0
   9:          0          0          0          0   IO-APIC    9-fasteoi   acpi
  12:          0          0          0          5   IO-APIC   12-edge      i8042
 120:          0          0          0          0   PCI-MSI 311296-edge      
PCIe PME
 121:          0          0          0          0   PCI-MSI 315392-edge      
PCIe PME
 122:          0          0          0          0   PCI-MSI 327680-edge      
PCIe PME
 123:          0          0       3328          0   PCI-MSI 294912-edge      
ahci[0000:00:12.0]
 124:          0        133          0          0   PCI-MSI 344064-edge      
xhci_hcd
 125:          0          0         32          0   PCI-MSI 245760-edge      
mei_me
 127:        381          0          0          0   PCI-MSI 1572864-edge      
enp3s0
 128:          0          0          0        236   PCI-MSI 32768-edge      i915
 129:          0        374          0          0   PCI-MSI 229376-edge      
snd_hda_intel:card0

> I don't know if we can afford to ignore a softirq even at this late stage. We 
> should
> probably avoid leaking any. So here is a possible fix, if you don't mind 
> trying:
> 
I tested your patch and at least in the first minutes of testing couldn't 
reproduce
the issue any longer. I tested manual system suspend and the following script 
you
sent when we started to analyze the issue.

Heiner

--------------------------------------------------------------------------

#!/bin/bash

do_hotplug()
{
        for i in $(seq 1 $2)
        do
                echo $1 > /sys/devices/system/cpu/cpu$i/online
        done
}

LAST_CPU=$(($(nproc)-1))

while true
do
        do_hotplug 0 $LAST_CPU
        do_hotplug 1 $LAST_CPU
done

Reply via email to