On Wed, Jan 09, 2019 at 11:20:50PM +0100, Heiner Kallweit wrote:
> On 28.12.2018 07:39, Heiner Kallweit wrote:
> > On 28.12.2018 07:34, Heiner Kallweit wrote:
> >> On 28.12.2018 02:31, Frederic Weisbecker wrote:
> >>> On Fri, Dec 28, 2018 at 12:11:12AM +0100, Heiner Kallweit wrote:
> >>>>
> >> [...]
> >>>
> >>> Interesting, the softirq is raised from hardirq but it's not handled in 
> >>> the end of
> >>> the IRQ. Are you running threaded IRQS by any chance? If so I would 
> >>> expect ksoftirqd
> >>> to handle the pending work before we go idle. However I can imagine a 
> >>> small window
> >>> where such an expectation may not be met: if the softirq is raised after 
> >>> the ksoftirqd
> >>> thread is parked (CPUHP_AP_SMPBOOT_THREADS), which is right before we 
> >>> disable the CPU
> >>> (CPUHP_TEARDOWN_CPU).
> >>>
> >> I have a network driver (r8169) using NAPI which runs in softirq context 
> >> AFAIK.
> >> For testing purposes I sometimes trigger system suspend via network, so 
> >> there is
> >> network adapter activity when system suspends. Apart from that nothing 
> >> really
> >> exciting:
> >>             CPU0       CPU1       CPU2       CPU3
> >>    0:         43          0          0          0   IO-APIC    2-edge      
> >> timer
> >>    1:          4          0          0          0   IO-APIC    1-edge      
> >> i8042
> >>    8:          0          1          0          0   IO-APIC    8-fasteoi   
> >> rtc0
> >>    9:          0          0          0          0   IO-APIC    9-fasteoi   
> >> acpi
> >>   12:          0          0          0          5   IO-APIC   12-edge      
> >> i8042
> >>  120:          0          0          0          0   PCI-MSI 311296-edge    
> >>   PCIe PME
> >>  121:          0          0          0          0   PCI-MSI 315392-edge    
> >>   PCIe PME
> >>  122:          0          0          0          0   PCI-MSI 327680-edge    
> >>   PCIe PME
> >>  123:          0          0       3328          0   PCI-MSI 294912-edge    
> >>   ahci[0000:00:12.0]
> >>  124:          0        133          0          0   PCI-MSI 344064-edge    
> >>   xhci_hcd
> >>  125:          0          0         32          0   PCI-MSI 245760-edge    
> >>   mei_me
> >>  127:        381          0          0          0   PCI-MSI 1572864-edge   
> >>    enp3s0
> >>  128:          0          0          0        236   PCI-MSI 32768-edge     
> >>  i915
> >>  129:          0        374          0          0   PCI-MSI 229376-edge    
> >>   snd_hda_intel:card0
> >>
> >>> I don't know if we can afford to ignore a softirq even at this late 
> >>> stage. We should
> >>> probably avoid leaking any. So here is a possible fix, if you don't mind 
> >>> trying:
> >>>
> >> I tested your patch and at least in the first minutes of testing couldn't 
> >> reproduce
> >> the issue any longer. I tested manual system suspend and the following 
> >> script you
> >> sent when we started to analyze the issue.
> >>
> > 
> > Also after some more time the issue didn't occur again. So it seems your 
> > analysis
> > was right and also the approach to fix it. Thanks!
> > Will let you know in case the issue should pop up again under special
> > circumstances.
> > 
> Frederic, so far this fix didn't appear in linux-next, are you going to 
> submit it?

Yep, I'll cook up a proper changelog and let Thomas judge if the change is 
worth.

Reply via email to