On Sun, Nov 22, 2020 at 6:19 AM Mark Kettenis <[email protected]> wrote:
> > Date: Sat, 21 Nov 2020 14:14:21 +0100 > > From: Otto Moerbeek <[email protected]> > > > > On Fri, Nov 20, 2020 at 04:25:31PM +0100, Otto Moerbeek wrote: > > > > > On Fri, Nov 20, 2020 at 02:01:55PM +0100, Claudio Jeker wrote: > > > > > > > On Fri, Nov 20, 2020 at 11:32:18AM +0100, Otto Moerbeek wrote: > > > > > On Fri, Nov 20, 2020 at 11:09:25AM +0100, Mark Kettenis wrote: > > > > > > > > > > > It's a relatively new driver. It uses MSI which pretty much > rules out > > > > > > an issue with shared interrupts. So I suspect this is an issue > with > > > > > > the rge(4) driver. In the past we have fun with packet counter > > > > > > overflow interrupts. Is the storm present immediately after you > bring > > > > > > up the interface? Or even before? > > > > > > > > > > No storm if not configured and no cable plugged in. > > > > > No storm if not configured and cable plugged in > > > > > No storm if configured and no cable > > > > > > > > > > Storm start when I plug the cable in. > > > > > > > > Sounds like an unexpected interrupt source that should probably be > masked. > > > > > > > > I would look at rge_intr() and what status you get and compare it to > the > > > > RGE_ISR defines. This may help to figure out what is going on. > > > > > > > > -- > > > > :wq Claudio > > > > > > The value of status after the RGE_READ_4 call is 0x10 all the time: > > > RGE_ISR_RX_DESC_UNAVAIL > > > > > > -Otto > > > > > > > If I apply the diff below the device starts to work without interrupt > storm. > > This is pure blind coding, I have little idea what I'm doing... > > > > -Otto > > > > Index: dev/pci/if_rgereg.h > > =================================================================== > > RCS file: /cvs/src/sys/dev/pci/if_rgereg.h,v > > retrieving revision 1.4 > > diff -u -p -r1.4 if_rgereg.h > > --- dev/pci/if_rgereg.h 31 Oct 2020 07:50:41 -0000 1.4 > > +++ dev/pci/if_rgereg.h 21 Nov 2020 13:06:39 -0000 > > @@ -88,7 +88,7 @@ > > > > #define RGE_INTRS \ > > (RGE_ISR_RX_OK | RGE_ISR_RX_ERR | RGE_ISR_TX_OK | \ > > - RGE_ISR_TX_ERR | RGE_ISR_RX_DESC_UNAVAIL | RGE_ISR_LINKCHG | \ > > + RGE_ISR_TX_ERR | RGE_ISR_LINKCHG | \ > > RGE_ISR_TX_DESC_UNAVAIL | RGE_ISR_PCS_TIMEOUT | RGE_ISR_SYSTEM_ERR) > > > > #define RGE_INTRS_TIMER \ > > That makes some sense. The description of that bit suggests this is > the interrupt you get when a packet arrives but there is no room in > the rx ring for it. > > That bit isn't actually all that useful. It could be used to account > dropped packets, but it isn't. It also could provide a trigger to > refill the ring if for some reason we end up with an empty rx ring. > In practice that doesn't work so well though, since a steady stream of > packets will mean the interrupt will keep on firing and potentially > keep the kernel from doing what it needs to free up mbufs such that > they can be put back on the ring. It is better to use a timeout to > refill the ring if the minimum number of mbufs on the ring can't be > maintained. > > The question remains why the interrupt keeps firing in a scenario > where the ring should have enough packets on it. But the answer may > turn out to be irrelevant. > > This patch really helped on an Odroid H2+ as well. https://www.hardkernel.com/shop/odroid-h2plus/ # Before fw1$ vmstat -i interrupt total rate irq0/clock 70183801 399 irq0/ipi 24695 0 irq144/inteldrm0 1161 0 irq176/azalia0 5 0 irq101/nvme0 1080485 6 irq114/rge0 38290462803 217809 irq115/rge1 38513876853 219080 irq105/sdhc0 6 0 Total 76875629809 437295 # Patch https://marc.info/?l=openbsd-bugs&m=160596450222340&w=2 Index: dev/pci/if_rgereg.h =================================================================== RCS file: /cvs/src/sys/dev/pci/if_rgereg.h,v retrieving revision 1.4 diff -u -p -r1.4 if_rgereg.h --- dev/pci/if_rgereg.h 31 Oct 2020 07:50:41 -0000 1.4 +++ dev/pci/if_rgereg.h 21 Nov 2020 13:06:39 -0000 @@ -88,7 +88,7 @@ #define RGE_INTRS \ (RGE_ISR_RX_OK | RGE_ISR_RX_ERR | RGE_ISR_TX_OK | \ - RGE_ISR_TX_ERR | RGE_ISR_RX_DESC_UNAVAIL | RGE_ISR_LINKCHG | \ + RGE_ISR_TX_ERR | RGE_ISR_LINKCHG | \ RGE_ISR_TX_DESC_UNAVAIL | RGE_ISR_PCS_TIMEOUT | RGE_ISR_SYSTEM_ERR) #define RGE_INTRS_TIMER \ # After fw1$ vmstat -i interrupt total rate irq0/clock 393885 399 irq0/ipi 12756 12 irq144/inteldrm0 1157 1 irq176/azalia0 5 0 irq101/nvme0 35528 36 irq114/rge0 8356 8 irq115/rge1 3294 3 irq105/sdhc0 6 0 Total 454987 461
