[ofa-general] missing "balance" in aggregate bi-directional SDP bulk transfer

Rick Jones Thu, 12 Jul 2007 14:47:32 -0700

I've been trudging through a set of netperf tests with OFED 1.2, andcame to a point where I was running concurrent netperf bidirectionaltests through both ports of:

03:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex(Tavor compatibility mode) (rev 20)

I configured ib0 and ib1 into separate IP subnets, and ran the"bidirectional TCP_RR" test (./configure --enable-bursts, large socketbuffer, large req/rsp size and a burst of 12 transactions in flight atone time) and the results were rather even - each connection achievedabout the same performance.

However, when I run the same test over SDP, some connections seem to getmuch better performance than others. For example, with two concurrentconnections, one over each port, one will get a much higher result thanthe other.

Four iterations of a pair of SDP_RR tests, one each across the two portsof the HCA (ie run two concurrent netperfs, four times in a row), whatthis calls port "1" is running over ib0, what it calls "3" is runningover ib1 (1 and 3 were the subnet numbers and were simply convenienttags), the units are transactions per second, process completionnotification messages trimmed for readability:

[EMAIL PROTECTED] netperf2_work]# for j in 1 2 3 4; do echo; for i in 1 3 ;do netperf -H 192.168.$i.107 -B "port $i" -l 60 -P 0 -v 0 -t SDP_RR ---s 1M -S 1M -r 64K -b 12 & done;wait;done


2294.65 port 1
10003.66 port 3

 398.63 port 1
11898.55 port 3

 269.73 port 3
12025.79 port 1

 478.29 port 3
11819.61 port 1

It doesn't seem that the favoritism is pegged to a specific port sincethey traded places there in the middle.

Now, if I reload the ib_sdp module, and set recv_poll and send_poll to 0I get this behaviour:

[EMAIL PROTECTED] netperf2_work]# for j in 1 2 3 4; do echo; for i in 1 3 ;do netperf -H 192.168.$i.107 -B "port $i" -l 60 -P 0 -v 0 -t SDP_RR ---s 1M -S 1M -r 64K -b 12 & done;wait;done



6132.89 port 1
6132.79 port 3

6127.32 port 1
6127.27 port 3

6006.84 port 1
6006.34 port 3

6134.83 port 1
6134.29 port 3

I guess it is possible for one of the netperfs or netservers to spinsuch that they preclude the other from running, even though I have fourcores on the system. For additional grins I pinned eachnetperf/netserver to its own CPU, with the send_poll and recv_poll putback to defaults (unloaded and reloaded the ib_sdp module)

[EMAIL PROTECTED] netperf2_work]# for j in 1 2 3 4; do echo; for i in 1 3 ;do netperf -T $i -H 192.168.$i.107 -B "port $i" -l 60 -P 0 -v 0 -tSDP_RR -- -s 1M -S 1M -r 64K -b 12 & done;wait;done


10108.65 port 1
2187.80 port 3

7754.14 port 3
4541.81 port 1

7013.78 port 3
5282.01 port 1

6499.44 port 3
5796.42 port 1

And I still see this apparant starvation of one of the connections,although it isn't (overall) as bad as without the binding so I guess itisn't anything one can workaround via CPU binding trickery. Is thisbehaviour expected?


rick jones

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] missing "balance" in aggregate bi-directional SDP bulk transfer

Reply via email to