Pls do not reply to this message. I am copying the general list on this bug report so that we can start discussion by mail. I am then going to reply copying the bugzilla reflector so that "reply all" will get tracked in bugzilla.
Subject: [Bug 465] New: IPoIB CM HA fails after several hours of failures Date: Sun, 18 Mar 2007 08:45:48 +0200 From: [EMAIL PROTECTED] https://bugs.openfabrics.org/show_bug.cgi?id=465 Summary: IPoIB CM HA fails after several hours of failures Product: OpenFabrics Linux Version: 1.2beta1 Platform: X86-64 OS/Version: All Status: NEW Severity: critical Priority: P2 Component: IPoIB AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] CC: [EMAIL PROTECTED] I've been trying IPoIB CM HA for a few weeks, and can't get it to run overnight. I've tried both SLES10 (LionCub DDR) and RHEL4 (LionMini SDR and LionMini DDR). I run netperf 2.4.1 with large socket buffers: netperf241 -H 192.168.2.46 -D -l 36000 -- -s 349520 -S 349520 -m 65536 While netperf is running, I start flipping IB ports once every 10 seconds. After a few hours, I sometimes see netperf throughput drop to almost zero: Interim result: 1911.72 10^6bits/s over 2.52 seconds Interim result: 4823.63 10^6bits/s over 1.00 seconds Interim result: 4816.90 10^6bits/s over 1.00 seconds Interim result: 4820.21 10^6bits/s over 1.00 seconds Interim result: 4816.85 10^6bits/s over 1.00 seconds Interim result: 4818.13 10^6bits/s over 1.00 seconds Interim result: 324.99 10^6bits/s over 14.83 seconds Interim result: 4811.39 10^6bits/s over 1.00 seconds Interim result: 4817.64 10^6bits/s over 1.00 seconds Interim result: 4812.06 10^6bits/s over 1.00 seconds Interim result: 4809.26 10^6bits/s over 1.00 seconds Interim result: 4817.21 10^6bits/s over 1.00 seconds Interim result: 85.80 10^6bits/s over 56.14 seconds Interim result: 1910.76 10^6bits/s over 2.52 seconds Interim result: 4813.64 10^6bits/s over 1.00 seconds Interim result: 4813.03 10^6bits/s over 1.00 seconds Interim result: 4807.23 10^6bits/s over 1.00 seconds Interim result: 4810.83 10^6bits/s over 1.00 seconds Interim result: 4813.61 10^6bits/s over 1.00 seconds Interim result: 272.39 10^6bits/s over 17.67 seconds Interim result: 4816.57 10^6bits/s over 1.00 seconds Interim result: 4810.02 10^6bits/s over 1.00 seconds Interim result: 4809.88 10^6bits/s over 1.00 seconds Interim result: 17.63 10^6bits/s over 278.01 seconds Interim result: 0.21 10^6bits/s over 30.58 seconds Interim result: 0.33 10^6bits/s over 14.20 seconds Interim result: 0.45 10^6bits/s over 13.90 seconds Interim result: 0.11 10^6bits/s over 56.20 seconds Interim result: 0.34 10^6bits/s over 13.95 seconds Interim result: 0.89 10^6bits/s over 14.21 seconds Interim result: 0.11 10^6bits/s over 55.17 seconds Interim result: 0.08 10^6bits/s over 56.20 seconds Interim result: 0.20 10^6bits/s over 32.14 seconds Interim result: 1.00 10^6bits/s over 6.30 seconds Interim result: 0.37 10^6bits/s over 17.03 seconds Interim result: 1.74 10^6bits/s over 7.25 seconds Interim result: 0.02 10^6bits/s over 345.16 seconds Interim result: 0.10 10^6bits/s over 112.83 seconds Interim result: 0.45 10^6bits/s over 13.91 seconds Interim result: 0.68 10^6bits/s over 6.91 seconds Interim result: 0.06 10^6bits/s over 112.48 seconds Interim result: 0.10 10^6bits/s over 60.32 seconds Interim result: 0.43 10^6bits/s over 14.55 seconds Other times netperf hangs or fails. Restarting netperf as is never works. Sometimes I can restart netperf with default socket buffer sizes. ----- End forwarded message ----- -- MST _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
