On Wed, Jan 06, 2021 at 02:34:14PM -0800, xSAPPYx wrote:
> On Sun, Nov 22, 2020 at 6:19 AM Mark Kettenis <[email protected]>
> wrote:
>
> > > Date: Sat, 21 Nov 2020 14:14:21 +0100
> > > From: Otto Moerbeek <[email protected]>
> > >
> > > On Fri, Nov 20, 2020 at 04:25:31PM +0100, Otto Moerbeek wrote:
> > >
> > > > On Fri, Nov 20, 2020 at 02:01:55PM +0100, Claudio Jeker wrote:
> > > >
> > > > > On Fri, Nov 20, 2020 at 11:32:18AM +0100, Otto Moerbeek wrote:
> > > > > > On Fri, Nov 20, 2020 at 11:09:25AM +0100, Mark Kettenis wrote:
> > > > > >
> > > > > > > It's a relatively new driver. It uses MSI which pretty much
> > rules out
> > > > > > > an issue with shared interrupts. So I suspect this is an issue
> > with
> > > > > > > the rge(4) driver. In the past we have fun with packet counter
> > > > > > > overflow interrupts. Is the storm present immediately after you
> > bring
> > > > > > > up the interface? Or even before?
> > > > > >
> > > > > > No storm if not configured and no cable plugged in.
> > > > > > No storm if not configured and cable plugged in
> > > > > > No storm if configured and no cable
> > > > > >
> > > > > > Storm start when I plug the cable in.
> > > > >
> > > > > Sounds like an unexpected interrupt source that should probably be
> > masked.
> > > > >
> > > > > I would look at rge_intr() and what status you get and compare it to
> > the
> > > > > RGE_ISR defines. This may help to figure out what is going on.
> > > > >
> > > > > --
> > > > > :wq Claudio
> > > >
> > > > The value of status after the RGE_READ_4 call is 0x10 all the time:
> > > > RGE_ISR_RX_DESC_UNAVAIL
> > > >
> > > > -Otto
> > > >
> > >
> > > If I apply the diff below the device starts to work without interrupt
> > storm.
> > > This is pure blind coding, I have little idea what I'm doing...
> > >
> > > -Otto
> > >
> > > Index: dev/pci/if_rgereg.h
> > > ===================================================================
> > > RCS file: /cvs/src/sys/dev/pci/if_rgereg.h,v
> > > retrieving revision 1.4
> > > diff -u -p -r1.4 if_rgereg.h
> > > --- dev/pci/if_rgereg.h 31 Oct 2020 07:50:41 -0000 1.4
> > > +++ dev/pci/if_rgereg.h 21 Nov 2020 13:06:39 -0000
> > > @@ -88,7 +88,7 @@
> > >
> > > #define RGE_INTRS \
> > > (RGE_ISR_RX_OK | RGE_ISR_RX_ERR | RGE_ISR_TX_OK | \
> > > - RGE_ISR_TX_ERR | RGE_ISR_RX_DESC_UNAVAIL | RGE_ISR_LINKCHG | \
> > > + RGE_ISR_TX_ERR | RGE_ISR_LINKCHG | \
> > > RGE_ISR_TX_DESC_UNAVAIL | RGE_ISR_PCS_TIMEOUT | RGE_ISR_SYSTEM_ERR)
> > >
> > > #define RGE_INTRS_TIMER \
> >
> > That makes some sense. The description of that bit suggests this is
> > the interrupt you get when a packet arrives but there is no room in
> > the rx ring for it.
> >
> > That bit isn't actually all that useful. It could be used to account
> > dropped packets, but it isn't. It also could provide a trigger to
> > refill the ring if for some reason we end up with an empty rx ring.
> > In practice that doesn't work so well though, since a steady stream of
> > packets will mean the interrupt will keep on firing and potentially
> > keep the kernel from doing what it needs to free up mbufs such that
> > they can be put back on the ring. It is better to use a timeout to
> > refill the ring if the minimum number of mbufs on the ring can't be
> > maintained.
> >
> > The question remains why the interrupt keeps firing in a scenario
> > where the ring should have enough packets on it. But the answer may
> > turn out to be irrelevant.
> >
> >
>
> This patch really helped on an Odroid H2+ as well.
> https://www.hardkernel.com/shop/odroid-h2plus/
>
> # Before
> fw1$ vmstat -i
> interrupt total rate
> irq0/clock 70183801 399
> irq0/ipi 24695 0
> irq144/inteldrm0 1161 0
> irq176/azalia0 5 0
> irq101/nvme0 1080485 6
> irq114/rge0 38290462803 217809
> irq115/rge1 38513876853 219080
> irq105/sdhc0 6 0
> Total 76875629809 437295
>
>
> # Patch
> https://marc.info/?l=openbsd-bugs&m=160596450222340&w=2
>
> Index: dev/pci/if_rgereg.h
> ===================================================================
> RCS file: /cvs/src/sys/dev/pci/if_rgereg.h,v
> retrieving revision 1.4
> diff -u -p -r1.4 if_rgereg.h
> --- dev/pci/if_rgereg.h 31 Oct 2020 07:50:41 -0000 1.4
> +++ dev/pci/if_rgereg.h 21 Nov 2020 13:06:39 -0000
> @@ -88,7 +88,7 @@
>
> #define RGE_INTRS \
> (RGE_ISR_RX_OK | RGE_ISR_RX_ERR | RGE_ISR_TX_OK | \
> - RGE_ISR_TX_ERR | RGE_ISR_RX_DESC_UNAVAIL | RGE_ISR_LINKCHG | \
> + RGE_ISR_TX_ERR | RGE_ISR_LINKCHG | \
> RGE_ISR_TX_DESC_UNAVAIL | RGE_ISR_PCS_TIMEOUT | RGE_ISR_SYSTEM_ERR)
>
> #define RGE_INTRS_TIMER \
>
>
> # After
> fw1$ vmstat -i
> interrupt total rate
> irq0/clock 393885 399
> irq0/ipi 12756 12
> irq144/inteldrm0 1157 1
> irq176/azalia0 5 0
> irq101/nvme0 35528 36
> irq114/rge0 8356 8
> irq115/rge1 3294 3
> irq105/sdhc0 6 0
> Total 454987 461
Thanks, this patch was committed so -current has it.
-Otto