On 15/09/16(Thu) 14:10, Olivier Cherrier wrote:
> On Thu, Sep 15, 2016 at 08:25:10AM +0000, [email protected] wrote:
> > On Wed, Sep 14, 2016 at 09:46:35PM +0200, [email protected] wrote:
> > > On Tue, Sep 13, 2016 at 08:50 +0000, Olivier Cherrier wrote:
> > > > >Synopsis:      crash with oce(4)
> > > > >Category:      network
> > > > >Environment:
> > > >         System      : OpenBSD 6.0
> > > >         Details     : OpenBSD 6.0 (GENERIC.MP) #2319: Tue Jul 26
> > > >         13:00:43 MDT 2016
> > > >                          
> > > > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > > 
> > > >         Architecture: OpenBSD.amd64
> > > >         Machine     : amd64
> > > > >Description:
> > > > 
> > > > After upgrading systems from 5.9 (with patch 004) to 6.0, I am getting
> > > > crash after a few seconds the network is configured. The problem seems
> > > > to be linked to oce(4) and pool, at least not linked to carp/vlan since
> > > > I can reproduce the crash with just «ifconfig ocex up» commands as
> > > > shown here while booting in single user:
> > > > 
> > > 
> > > I didn't test CARP, but I cound't reproduce this with vlans on
> > > top of a trunk on top of two oce's with 6.0-release.  I will
> > > double check -current tomorrow.  I don't see a good reason for
> > > the "missing descriptor in rxeof" unless it's a stray interrupt
> > > with a valid completion queue entry which is a bit too weird.
> > > 
> > > Perhaps we're not filling the Rx ring with enough slots and get
> > > a heavily fragmented jumbo frame that the card has managed to
> > > only partially fit into provided space.  How about this diff?
> > > 
> > > diff --git sys/dev/pci/if_oce.c sys/dev/pci/if_oce.c
> > > index ee74185..a74b35b 100644
> > > --- sys/dev/pci/if_oce.c
> > > +++ sys/dev/pci/if_oce.c
> > > @@ -1078,7 +1078,7 @@ oce_init(void *arg)
> > >           rq->ring->index  = 0;
> > >  
> > >           /* oce splits jumbos into 2k chunks... */
> > > -         if_rxr_init(&rq->rxring, 8, rq->nitems);
> > > +         if_rxr_init(&rq->rxring, OCE_MAX_TX_ELEMENTS, rq->nitems);
> > >  
> > >           if (!oce_alloc_rx_bufs(rq)) {
> > >                   printf("%s: failed to allocate rx buffers\n",
> > > @@ -1560,8 +1560,8 @@ oce_rxeof(struct oce_rq *rq, struct oce_nic_rx_cqe 
> > > *cqe)
> > >  
> > >   for (i = 0; i < cqe->u0.s.num_fragments; i++) {
> > >           if ((pkt = oce_pkt_get(&rq->pkt_list)) == NULL) {
> > > -                 printf("%s: missing descriptor in rxeof\n",
> > > -                     sc->sc_dev.dv_xname);
> > > +                 printf("%s: missing descriptor in rxeof, frag %d/%u\n",
> > > +                     sc->sc_dev.dv_xname, i, cqe->u0.s.num_fragments);
> > >                   goto exit;
> > >           }
> > >  
> > > 
> > 
> > 
> >     Hi Mike,
> > 
> > With Current and your patch, it is stable (no crash) and there is no
> > "missing descriptor in rxeof" message anymore.
> > 
> > But there is still the vlan part that is not working.
> 
> 
> Precisely, it seems vlan is not working when I try to define it on top
> of a trunk.
> 
> I moved all the hostname.* files into a directory called
> "/etc/hostname.ALL".  Then I booted from scratch (so with a 'blank'
> network config) and experimented this way:
> 
> 
> # cd /etc/hostname.ALL/
> # ls -la
> total 48
> drwxr-xr-x   2 root  wheel   512 Sep 15 15:22 .
> drwxr-xr-x  24 root  wheel  2048 Sep 15 15:22 ..
> -rw-r-----   1 root  wheel    85 Dec 15  2015 hostname.carp0
> -rw-r-----   1 root  wheel   100 Dec 15  2015 hostname.carp1
> -rw-r-----   1 root  wheel    73 Jan 27  2016 hostname.carp2
> -rw-r-----   1 root  wheel    88 Jan 27  2016 hostname.carp3
> -rw-r-----   1 root  wheel     3 Dec 15  2015 hostname.oce0
> -rw-r-----   1 root  wheel     3 Dec 15  2015 hostname.oce1
> -rw-r-----   1 root  wheel    33 Dec 15  2015 hostname.pfsync0
> -rw-r-----   1 root  wheel    56 Dec 15  2015 hostname.trunk0
> -rw-r-----   1 root  wheel    58 Dec 15  2015 hostname.vlan1
> -rw-r-----   1 root  wheel    60 Dec 15  2015 hostname.vlan20
> # 
> # cat hostname.vlan20 
> vlan 20 vlandev trunk0
> inet x.x.x.x 255.255.255.0
> up
> # 
> # ifconfig oce0 up
> # ifconfig vlan20 create
> # ifconfig vlan20 vlan 20 vlandev oce0
> # ifconfig vlan20 inet x.x.x.x 255.255.255.0
> # ifconfig vlan20 up
> # 
> #
> # echo it works
> it works
> # 
> # 
> # ifconfig vlan
> vlan20: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>         lladdr 00:17:a4:77:04:3e
>         index 10 priority 0 llprio 3
>         vlan: 20 parent interface: oce0
>         vnetid: 20
>         parent: oce0
>         groups: vlan
>         status: active
>         inet x.x.x.x netmask 0xffffff00 broadcast x.x.x.x
> # 
> # ifconfig vlan20 destroy
> # 
> # ifconfig vlan
> vlan: no such interface
> # ifconfig oce1 up
> # 
> # cat hostname.trunk0 
> trunkport oce0 trunkport oce1 trunkproto loadbalance
> up
> # 
> # ifconfig trunk0 create
> # ifconfig trunk0 trunkport oce0 trunkport oce1 trunkproto loadbalance
> # ifconfig trunk0 up
> # 
> # 
> # ifconfig vlan20 create
> # ifconfig vlan20 vlan 20 vlandev trunk0
> ifconfig: SIOCSETVLAN: No buffer space available
> # 
> 
> 
> So it doesn't seems to be related to oce(4) but more to trunk(4).

By looking at the error message it seems related to the mtu and hardmtu
values of trunk(4) which are inherited from oce(4).

Reply via email to