On Tue, Sep 13, 2016 at 08:50 +0000, Olivier Cherrier wrote:
> >Synopsis: crash with oce(4)
> >Category: network
> >Environment:
> System : OpenBSD 6.0
> Details : OpenBSD 6.0 (GENERIC.MP) #2319: Tue Jul 26
> 13:00:43 MDT 2016
>
> [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
> Architecture: OpenBSD.amd64
> Machine : amd64
> >Description:
>
> After upgrading systems from 5.9 (with patch 004) to 6.0, I am getting
> crash after a few seconds the network is configured. The problem seems
> to be linked to oce(4) and pool, at least not linked to carp/vlan since
> I can reproduce the crash with just «ifconfig ocex up» commands as
> shown here while booting in single user:
>
I didn't test CARP, but I cound't reproduce this with vlans on
top of a trunk on top of two oce's with 6.0-release. I will
double check -current tomorrow. I don't see a good reason for
the "missing descriptor in rxeof" unless it's a stray interrupt
with a valid completion queue entry which is a bit too weird.
Perhaps we're not filling the Rx ring with enough slots and get
a heavily fragmented jumbo frame that the card has managed to
only partially fit into provided space. How about this diff?
diff --git sys/dev/pci/if_oce.c sys/dev/pci/if_oce.c
index ee74185..a74b35b 100644
--- sys/dev/pci/if_oce.c
+++ sys/dev/pci/if_oce.c
@@ -1078,7 +1078,7 @@ oce_init(void *arg)
rq->ring->index = 0;
/* oce splits jumbos into 2k chunks... */
- if_rxr_init(&rq->rxring, 8, rq->nitems);
+ if_rxr_init(&rq->rxring, OCE_MAX_TX_ELEMENTS, rq->nitems);
if (!oce_alloc_rx_bufs(rq)) {
printf("%s: failed to allocate rx buffers\n",
@@ -1560,8 +1560,8 @@ oce_rxeof(struct oce_rq *rq, struct oce_nic_rx_cqe *cqe)
for (i = 0; i < cqe->u0.s.num_fragments; i++) {
if ((pkt = oce_pkt_get(&rq->pkt_list)) == NULL) {
- printf("%s: missing descriptor in rxeof\n",
- sc->sc_dev.dv_xname);
+ printf("%s: missing descriptor in rxeof, frag %d/%u\n",
+ sc->sc_dev.dv_xname, i, cqe->u0.s.num_fragments);
goto exit;
}