> > OTOH it is quite possible that ipoib is corrupting an skb somehow so > that when it gets reused by e1000, you see a crash. The fact that you > were running netperf on IB when e1000 crashed is somewhat suspicious.
Yes, exactly the lingering suspicions that I had. I ran several iterations of neteperf on e1000 and there were no crashes. So, I started looking at the patch more closely. I think I am on to something now. In ipoib_cm_handle_rx_wc() I see two things (I have not yet looked at the latest changes that you mentioned earlier today) : 1. Do not understand the usage and purpose of recv_count (something new that you have introduced). Can you please explain. However, the suspicion being that if somehow the if clause is executed, the rx_ring gets freed and so all the skb pointers are bogus. I have commented out this segment of code. 2. The call to ipoib_cm_alloc_rx_skb() in ipoib_cm_handle_rx_wc() uses an index value of 0 (hard coded) which is incorrect for no srq. I have changed that to index instead. I have been running this for some hours now; no crashes and no errors. This is using Slub. If I get a chance I will run with slab over the weekend and let you know of the results. Pradeep _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
