On Wed, Jan 06, 2021 at 02:34:14PM -0800, xSAPPYx wrote:

> On Sun, Nov 22, 2020 at 6:19 AM Mark Kettenis <[email protected]>
> wrote:
> 
> > > Date: Sat, 21 Nov 2020 14:14:21 +0100
> > > From: Otto Moerbeek <[email protected]>
> > >
> > > On Fri, Nov 20, 2020 at 04:25:31PM +0100, Otto Moerbeek wrote:
> > >
> > > > On Fri, Nov 20, 2020 at 02:01:55PM +0100, Claudio Jeker wrote:
> > > >
> > > > > On Fri, Nov 20, 2020 at 11:32:18AM +0100, Otto Moerbeek wrote:
> > > > > > On Fri, Nov 20, 2020 at 11:09:25AM +0100, Mark Kettenis wrote:
> > > > > >
> > > > > > > It's a relatively new driver.  It uses MSI which pretty much
> > rules out
> > > > > > > an issue with shared interrupts.  So I suspect this is an issue
> > with
> > > > > > > the rge(4) driver.  In the past we have fun with packet counter
> > > > > > > overflow interrupts.  Is the storm present immediately after you
> > bring
> > > > > > > up the interface?  Or even before?
> > > > > >
> > > > > > No storm if not configured and no cable plugged in.
> > > > > > No storm if not configured and cable plugged in
> > > > > > No storm if configured and no cable
> > > > > >
> > > > > > Storm start when I plug the cable in.
> > > > >
> > > > > Sounds like an unexpected interrupt source that should probably be
> > masked.
> > > > >
> > > > > I would look at rge_intr() and what status you get and compare it to
> > the
> > > > > RGE_ISR defines. This may help to figure out what is going on.
> > > > >
> > > > > --
> > > > > :wq Claudio
> > > >
> > > > The value of status after the RGE_READ_4 call is 0x10 all the time:
> > > > RGE_ISR_RX_DESC_UNAVAIL
> > > >
> > > >     -Otto
> > > >
> > >
> > > If I apply the diff below the device starts to work without interrupt
> > storm.
> > > This is pure blind coding, I have little idea what I'm doing...
> > >
> > >       -Otto
> > >
> > > Index: dev/pci/if_rgereg.h
> > > ===================================================================
> > > RCS file: /cvs/src/sys/dev/pci/if_rgereg.h,v
> > > retrieving revision 1.4
> > > diff -u -p -r1.4 if_rgereg.h
> > > --- dev/pci/if_rgereg.h       31 Oct 2020 07:50:41 -0000      1.4
> > > +++ dev/pci/if_rgereg.h       21 Nov 2020 13:06:39 -0000
> > > @@ -88,7 +88,7 @@
> > >
> > >  #define RGE_INTRS            \
> > >       (RGE_ISR_RX_OK | RGE_ISR_RX_ERR | RGE_ISR_TX_OK |               \
> > > -     RGE_ISR_TX_ERR | RGE_ISR_RX_DESC_UNAVAIL | RGE_ISR_LINKCHG |    \
> > > +     RGE_ISR_TX_ERR | RGE_ISR_LINKCHG |      \
> > >       RGE_ISR_TX_DESC_UNAVAIL | RGE_ISR_PCS_TIMEOUT | RGE_ISR_SYSTEM_ERR)
> > >
> > >  #define RGE_INTRS_TIMER              \
> >
> > That makes some sense.  The description of that bit suggests this is
> > the interrupt you get when a packet arrives but there is no room in
> > the rx ring for it.
> >
> > That bit isn't actually all that useful.  It could be used to account
> > dropped packets, but it isn't.  It also could provide a trigger to
> > refill the ring if for some reason we end up with an empty rx ring.
> > In practice that doesn't work so well though, since a steady stream of
> > packets will mean the interrupt will keep on firing and potentially
> > keep the kernel from doing what it needs to free up mbufs such that
> > they can be put back on the ring.  It is better to use a timeout to
> > refill the ring if the minimum number of mbufs on the ring can't be
> > maintained.
> >
> > The question remains why the interrupt keeps firing in a scenario
> > where the ring should have enough packets on it.  But the answer may
> > turn out to be irrelevant.
> >
> >
> 
> This patch really helped on an Odroid H2+ as well.
> https://www.hardkernel.com/shop/odroid-h2plus/
> 
> # Before
> fw1$ vmstat -i
> interrupt                       total     rate
> irq0/clock                   70183801      399
> irq0/ipi                        24695        0
> irq144/inteldrm0                 1161        0
> irq176/azalia0                      5        0
> irq101/nvme0                  1080485        6
> irq114/rge0               38290462803   217809
> irq115/rge1               38513876853   219080
> irq105/sdhc0                        6        0
> Total                     76875629809   437295
> 
> 
> # Patch
> https://marc.info/?l=openbsd-bugs&m=160596450222340&w=2
> 
> Index: dev/pci/if_rgereg.h
> ===================================================================
> RCS file: /cvs/src/sys/dev/pci/if_rgereg.h,v
> retrieving revision 1.4
> diff -u -p -r1.4 if_rgereg.h
> --- dev/pci/if_rgereg.h 31 Oct 2020 07:50:41 -0000  1.4
> +++ dev/pci/if_rgereg.h 21 Nov 2020 13:06:39 -0000
> @@ -88,7 +88,7 @@
> 
>  #define RGE_INTRS      \
>     (RGE_ISR_RX_OK | RGE_ISR_RX_ERR | RGE_ISR_TX_OK |       \
> -   RGE_ISR_TX_ERR | RGE_ISR_RX_DESC_UNAVAIL | RGE_ISR_LINKCHG |    \
> +   RGE_ISR_TX_ERR | RGE_ISR_LINKCHG |  \
>     RGE_ISR_TX_DESC_UNAVAIL | RGE_ISR_PCS_TIMEOUT | RGE_ISR_SYSTEM_ERR)
> 
>  #define RGE_INTRS_TIMER        \
> 
> 
> # After
> fw1$ vmstat -i
> interrupt                       total     rate
> irq0/clock                     393885      399
> irq0/ipi                        12756       12
> irq144/inteldrm0                 1157        1
> irq176/azalia0                      5        0
> irq101/nvme0                    35528       36
> irq114/rge0                      8356        8
> irq115/rge1                      3294        3
> irq105/sdhc0                        6        0
> Total                          454987      461

Thanks, this patch was committed so -current has it.

        -Otto

Reply via email to