Re: 8.0-RELEASE-p3: 4k jumbo mbuf cluster exhaustion

Andre Oppermann Mon, 23 Aug 2010 12:04:59 -0700

On 23.08.2010 19:52, Pyun YongHyeon wrote:

On Mon, Aug 23, 2010 at 12:18:01PM +0200, Andre Oppermann wrote:

On 23.08.2010 11:26, Adrian Chadd wrote:

On 23 August 2010 06:27, Pyun YongHyeon<pyu...@gmail.com>   wrote:

I recall there was SIOCSIFCAP ioctl handling bug in bce(4) on 8.0 so
it might also disable IFCAP_TSO4/IFCAP_TXCSUM/IFCAP_RXCSUM when yo
disabled RX checksum offloading. But I can't explain how checksum
offloading could be related with the growth of 4k jumbo buffers.


Neither can I!

I'm trying to come up with a reproduction method that doesn't involve
"put box on the internet, push clients through it, wait."


Network drivers use 2k sized mbuf clusters on receive.  So the problem
doesn't seem to be RX related.


bce(4) is special in this regards. The controller would allocate
jumbo cluster on RX if jumbo frame is used. If header splitting is
used, driver will use normal mbuf clusters.


Didn't know that.

The function that is called on a socket write is sosend_generic() which
makes use of m_getm2().  This function allocates mbuf chains with the
tightest packing it can achieve.  It will make use 4k (page size) mbufs
as much as it can.  This is where they come from.

It seems the 4k clusters do not get freed back to the pool after they've
been sent by the NIC and dropped from the socket buffer after the ACK has
arrived.  The leak must occur in one of these two places.  The socket
buffer is unlikely as it would affect not just you but everyone else too.
Thus the mbuf freeing after DMA/tx in the bce(4) driver is the prime
suspect.


I know bce(4) has a couple of bug in TX path(wrong dma tag, lack of
bus_dmamap_sync(9) etc) but this is the same code path with/without
TX checksum offloading. This is one of reason why I still do not
understand what's really happening here. TX checksum offloading may
introduce additional frame processing time to fill internal FIFO to
compute checksum before transmitting the frame to wire such that it
can change timing of TX path. This timing change might trigger the
TX path bug. It's just vague guessing though.


Had a chat with clau...@openbsd and he said that the bce(4) DMA engine
can only access the first 1GB of physical RAM and has to use bounce
buffers all the time.  Maybe this is related.

--
Andre
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: 8.0-RELEASE-p3: 4k jumbo mbuf cluster exhaustion

Reply via email to