I understand that you're using ofed-1.5. We should check if those bugs exist in 1.4.1 too - since there were many changes in SDP in ofed-1.5.
Please open bugs in bugzilla here: https://bugs.openfabrics.org Thanks, Amir On 07/01/2009 04:36 PM, Lars Ellenberg wrote: > On Wed, Jul 01, 2009 at 04:02:17PM +0300, Amir Vadai wrote: > Subject: Re: [patch] fix SDP page leak in sdp_bz_cleanup > In-Reply-To: <[email protected]> >> Hi Lars, >> >> This is the right place for posting patches. >> >> I will commit it ASAP into both branches. > > Thanks for that one. > > now, let me summarize some other findings. > > == off-by-one error, data corruption == > > I think that "sometimes" you lose the last byte of a fragment. > > situation: multi core, mlx4_ib driver, > IPoIB configured, SDP configured, more details on request ;) > > do large message traffic on several streaming sockets > at the same time, using as much bandwidth as possible, > some on IPoIB, some on SDP. > > "sometimes" (typically within a couple of minutes), when receiving the > stream, the last byte of some fragment is missing, or replaced by the > first byte of the next fragment (if any). > > This has been noticed when using SDP from kernel space (for DRBD), > and reproduced in userland. > > I will provide two simple perl scripts (server and client) today or > tomorrow, so you should be able to reproduce this yourself in userland. > > It does not occur (within my patience time span) if there is not much > load, or if I only use one stream, or even if I only use SDP (and not > simultaneously also IPoIB streams). It only happens on SDP streams. > > I'm not sure if this off-by-one happens during send or recv. > > I'm open for suggestions to aid in tracking it down. > > > == module count imbalance == > > after modprobe, module usage count of ib_sdp is 0, as it should be. > starting to use it with some streaming sockest, module count goes up. > > once the streams start disconnecting, being interrupted from the other > side, reconnect and similar stuff, module count quickly drops below > zero, manifesting in lsmod showing a module count of 4.2 millon ;) > > I'm still trying to track this down, I'm not yet sure if it is a double > module_put, or a missing (try_)module_get ... > > > more when I find more. > > Cheers, > -- Amir Vadai Software Eng. Mellanox Technologies mailto: [email protected] Tel +972-3-6259539 _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
