I've been trying to debug a problem where if my GLDv3 driver is
unplumbed while under heavy transmit load, the receiver will see
a handful of frames with bad IP checksums. This problem disappears
when I disable LSO. I think I understand what's happening, and I
want to get a wider opinion.
It is important to note that our hardware uses both partial checksums
and LSO. Further, it assumes that if LSO is enabled on a packet, then
partial checksum offload is also enabled. This means that it expects
to find the TCP pseudo header checksum in the TCP header of all LSO
frames.
What seems to be happening is that when the interface is unplumbed,
ill_capability_reset() is called, and then does this:
ill_capability_mdt_reset(ill, &sc_mp);
ill_capability_hcksum_reset(ill, &sc_mp);
ill_capability_zerocopy_reset(ill, &sc_mp);
ill_capability_ipsec_reset(ill, &sc_mp);
ill_capability_dls_reset(ill, &sc_mp);
ill_capability_lso_reset(ill, &sc_mp);
This seems to leave the ill in a "bad" state (at least for me)
where ILL_CAPAB_LSO (0x80) is set, but ILL_CAPAB_HCKSUM (0x8)
is not. This will cause our NIC to send out frames with bad
TCP checksums, as it will assume the full checksum found in the
TCP header is the pseudo hdr sum, and will corrupt it.
I think the bug could be fixed if the ill_capability_lso_reset() was
move to be prior to ill_capability_hcksum_reset(). Alternatively,
I could hack my driver to re-calculate the TCP pseudo hdr sum for
every single packet I send.
So.. Is this a bug in Solaris, or should I just hack the driver?
Thanks,
Drew
PS:
Here is the "proof":
[3]> ill_capability_lso_reset::bp
[3]> ::cont
kmdb: stop at ip`ill_capability_lso_reset
kmdb: target stopped at:
ip`ill_capability_lso_reset: pushq %rbp
[1]> ::stack
ip`ill_capability_lso_reset(ffffff01f64a77e8, ffffff00084d2360)
<....>
[1]> ffffff01f64a77e8::print ill_t ill_capabilities
ill_capabilities = 0x80
[1]> ::cont
_______________________________________________
networking-discuss mailing list
[email protected]