Marco Barbero wrote:
I'm exeriencing a nasty kernel soft lockup on one cluster. Have to
say I have tons of clusters using same config and all is working fine
i recently experienced cluster lockup issues because my ethernet
adapters (4 / server) which were bonded to "bond0" inadvertently had
some options which "appear to be incompatible" with intra-cluster
communications, i.e.:
tx-checksumming was on; and
scatter-gather was on; and
TSO was on; and
generic segmentation offload was on.
i noted that these options can cause problems on both bonded and
non-bonded intf's. i verified the error using wireshark: i saw tcp and
udp checksum errors coming from one server with these options set.
if you do not bond, you can reset these with:
ethtool -K eth"x" tx off sg off tso off gso off
if you bond, you'll probably be unable to reset these with ethtool;
instead, reset the eth"x" intf's themselves (even tho they are slaves)
and the bonded intf will reset automagically. i verified the fix using
wireshark: the server with the checksum errors is now behaving nicely.
hth
yvette hirth
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user