On Sun, Nov 22, 2020 at 6:19 AM Mark Kettenis <[email protected]>
wrote:

> > Date: Sat, 21 Nov 2020 14:14:21 +0100
> > From: Otto Moerbeek <[email protected]>
> >
> > On Fri, Nov 20, 2020 at 04:25:31PM +0100, Otto Moerbeek wrote:
> >
> > > On Fri, Nov 20, 2020 at 02:01:55PM +0100, Claudio Jeker wrote:
> > >
> > > > On Fri, Nov 20, 2020 at 11:32:18AM +0100, Otto Moerbeek wrote:
> > > > > On Fri, Nov 20, 2020 at 11:09:25AM +0100, Mark Kettenis wrote:
> > > > >
> > > > > > It's a relatively new driver.  It uses MSI which pretty much
> rules out
> > > > > > an issue with shared interrupts.  So I suspect this is an issue
> with
> > > > > > the rge(4) driver.  In the past we have fun with packet counter
> > > > > > overflow interrupts.  Is the storm present immediately after you
> bring
> > > > > > up the interface?  Or even before?
> > > > >
> > > > > No storm if not configured and no cable plugged in.
> > > > > No storm if not configured and cable plugged in
> > > > > No storm if configured and no cable
> > > > >
> > > > > Storm start when I plug the cable in.
> > > >
> > > > Sounds like an unexpected interrupt source that should probably be
> masked.
> > > >
> > > > I would look at rge_intr() and what status you get and compare it to
> the
> > > > RGE_ISR defines. This may help to figure out what is going on.
> > > >
> > > > --
> > > > :wq Claudio
> > >
> > > The value of status after the RGE_READ_4 call is 0x10 all the time:
> > > RGE_ISR_RX_DESC_UNAVAIL
> > >
> > >     -Otto
> > >
> >
> > If I apply the diff below the device starts to work without interrupt
> storm.
> > This is pure blind coding, I have little idea what I'm doing...
> >
> >       -Otto
> >
> > Index: dev/pci/if_rgereg.h
> > ===================================================================
> > RCS file: /cvs/src/sys/dev/pci/if_rgereg.h,v
> > retrieving revision 1.4
> > diff -u -p -r1.4 if_rgereg.h
> > --- dev/pci/if_rgereg.h       31 Oct 2020 07:50:41 -0000      1.4
> > +++ dev/pci/if_rgereg.h       21 Nov 2020 13:06:39 -0000
> > @@ -88,7 +88,7 @@
> >
> >  #define RGE_INTRS            \
> >       (RGE_ISR_RX_OK | RGE_ISR_RX_ERR | RGE_ISR_TX_OK |               \
> > -     RGE_ISR_TX_ERR | RGE_ISR_RX_DESC_UNAVAIL | RGE_ISR_LINKCHG |    \
> > +     RGE_ISR_TX_ERR | RGE_ISR_LINKCHG |      \
> >       RGE_ISR_TX_DESC_UNAVAIL | RGE_ISR_PCS_TIMEOUT | RGE_ISR_SYSTEM_ERR)
> >
> >  #define RGE_INTRS_TIMER              \
>
> That makes some sense.  The description of that bit suggests this is
> the interrupt you get when a packet arrives but there is no room in
> the rx ring for it.
>
> That bit isn't actually all that useful.  It could be used to account
> dropped packets, but it isn't.  It also could provide a trigger to
> refill the ring if for some reason we end up with an empty rx ring.
> In practice that doesn't work so well though, since a steady stream of
> packets will mean the interrupt will keep on firing and potentially
> keep the kernel from doing what it needs to free up mbufs such that
> they can be put back on the ring.  It is better to use a timeout to
> refill the ring if the minimum number of mbufs on the ring can't be
> maintained.
>
> The question remains why the interrupt keeps firing in a scenario
> where the ring should have enough packets on it.  But the answer may
> turn out to be irrelevant.
>
>

This patch really helped on an Odroid H2+ as well.
https://www.hardkernel.com/shop/odroid-h2plus/

# Before
fw1$ vmstat -i
interrupt                       total     rate
irq0/clock                   70183801      399
irq0/ipi                        24695        0
irq144/inteldrm0                 1161        0
irq176/azalia0                      5        0
irq101/nvme0                  1080485        6
irq114/rge0               38290462803   217809
irq115/rge1               38513876853   219080
irq105/sdhc0                        6        0
Total                     76875629809   437295


# Patch
https://marc.info/?l=openbsd-bugs&m=160596450222340&w=2

Index: dev/pci/if_rgereg.h
===================================================================
RCS file: /cvs/src/sys/dev/pci/if_rgereg.h,v
retrieving revision 1.4
diff -u -p -r1.4 if_rgereg.h
--- dev/pci/if_rgereg.h 31 Oct 2020 07:50:41 -0000  1.4
+++ dev/pci/if_rgereg.h 21 Nov 2020 13:06:39 -0000
@@ -88,7 +88,7 @@

 #define RGE_INTRS      \
    (RGE_ISR_RX_OK | RGE_ISR_RX_ERR | RGE_ISR_TX_OK |       \
-   RGE_ISR_TX_ERR | RGE_ISR_RX_DESC_UNAVAIL | RGE_ISR_LINKCHG |    \
+   RGE_ISR_TX_ERR | RGE_ISR_LINKCHG |  \
    RGE_ISR_TX_DESC_UNAVAIL | RGE_ISR_PCS_TIMEOUT | RGE_ISR_SYSTEM_ERR)

 #define RGE_INTRS_TIMER        \


# After
fw1$ vmstat -i
interrupt                       total     rate
irq0/clock                     393885      399
irq0/ipi                        12756       12
irq144/inteldrm0                 1157        1
irq176/azalia0                      5        0
irq101/nvme0                    35528       36
irq114/rge0                      8356        8
irq115/rge1                      3294        3
irq105/sdhc0                        6        0
Total                          454987      461

Reply via email to