> Date: Sat, 21 Nov 2020 14:14:21 +0100 > From: Otto Moerbeek <[email protected]> > > On Fri, Nov 20, 2020 at 04:25:31PM +0100, Otto Moerbeek wrote: > > > On Fri, Nov 20, 2020 at 02:01:55PM +0100, Claudio Jeker wrote: > > > > > On Fri, Nov 20, 2020 at 11:32:18AM +0100, Otto Moerbeek wrote: > > > > On Fri, Nov 20, 2020 at 11:09:25AM +0100, Mark Kettenis wrote: > > > > > > > > > It's a relatively new driver. It uses MSI which pretty much rules out > > > > > an issue with shared interrupts. So I suspect this is an issue with > > > > > the rge(4) driver. In the past we have fun with packet counter > > > > > overflow interrupts. Is the storm present immediately after you bring > > > > > up the interface? Or even before? > > > > > > > > No storm if not configured and no cable plugged in. > > > > No storm if not configured and cable plugged in > > > > No storm if configured and no cable > > > > > > > > Storm start when I plug the cable in. > > > > > > Sounds like an unexpected interrupt source that should probably be masked. > > > > > > I would look at rge_intr() and what status you get and compare it to the > > > RGE_ISR defines. This may help to figure out what is going on. > > > > > > -- > > > :wq Claudio > > > > The value of status after the RGE_READ_4 call is 0x10 all the time: > > RGE_ISR_RX_DESC_UNAVAIL > > > > -Otto > > > > If I apply the diff below the device starts to work without interrupt storm. > This is pure blind coding, I have little idea what I'm doing... > > -Otto > > Index: dev/pci/if_rgereg.h > =================================================================== > RCS file: /cvs/src/sys/dev/pci/if_rgereg.h,v > retrieving revision 1.4 > diff -u -p -r1.4 if_rgereg.h > --- dev/pci/if_rgereg.h 31 Oct 2020 07:50:41 -0000 1.4 > +++ dev/pci/if_rgereg.h 21 Nov 2020 13:06:39 -0000 > @@ -88,7 +88,7 @@ > > #define RGE_INTRS \ > (RGE_ISR_RX_OK | RGE_ISR_RX_ERR | RGE_ISR_TX_OK | \ > - RGE_ISR_TX_ERR | RGE_ISR_RX_DESC_UNAVAIL | RGE_ISR_LINKCHG | \ > + RGE_ISR_TX_ERR | RGE_ISR_LINKCHG | \ > RGE_ISR_TX_DESC_UNAVAIL | RGE_ISR_PCS_TIMEOUT | RGE_ISR_SYSTEM_ERR) > > #define RGE_INTRS_TIMER \
That makes some sense. The description of that bit suggests this is the interrupt you get when a packet arrives but there is no room in the rx ring for it. That bit isn't actually all that useful. It could be used to account dropped packets, but it isn't. It also could provide a trigger to refill the ring if for some reason we end up with an empty rx ring. In practice that doesn't work so well though, since a steady stream of packets will mean the interrupt will keep on firing and potentially keep the kernel from doing what it needs to free up mbufs such that they can be put back on the ring. It is better to use a timeout to refill the ring if the minimum number of mbufs on the ring can't be maintained. The question remains why the interrupt keeps firing in a scenario where the ring should have enough packets on it. But the answer may turn out to be irrelevant.
