I've been trudging through a set of netperf tests with OFED 1.2, and came to a point where I was running concurrent netperf bidirectional tests through both ports of:

03:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor compatibility mode) (rev 20)


I configured ib0 and ib1 into separate IP subnets, and ran the "bidirectional TCP_RR" test (./configure --enable-bursts, large socket buffer, large req/rsp size and a burst of 12 transactions in flight at one time) and the results were rather even - each connection achieved about the same performance.

However, when I run the same test over SDP, some connections seem to get much better performance than others. For example, with two concurrent connections, one over each port, one will get a much higher result than the other.

Four iterations of a pair of SDP_RR tests, one each across the two ports of the HCA (ie run two concurrent netperfs, four times in a row), what this calls port "1" is running over ib0, what it calls "3" is running over ib1 (1 and 3 were the subnet numbers and were simply convenient tags), the units are transactions per second, process completion notification messages trimmed for readability:

[EMAIL PROTECTED] netperf2_work]# for j in 1 2 3 4; do echo; for i in 1 3 ; do netperf -H 192.168.$i.107 -B "port $i" -l 60 -P 0 -v 0 -t SDP_RR -- -s 1M -S 1M -r 64K -b 12 & done;wait;done

2294.65 port 1
10003.66 port 3

 398.63 port 1
11898.55 port 3

 269.73 port 3
12025.79 port 1

 478.29 port 3
11819.61 port 1

It doesn't seem that the favoritism is pegged to a specific port since they traded places there in the middle.

Now, if I reload the ib_sdp module, and set recv_poll and send_poll to 0 I get this behaviour:

[EMAIL PROTECTED] netperf2_work]# for j in 1 2 3 4; do echo; for i in 1 3 ; do netperf -H 192.168.$i.107 -B "port $i" -l 60 -P 0 -v 0 -t SDP_RR -- -s 1M -S 1M -r 64K -b 12 & done;wait;done


6132.89 port 1
6132.79 port 3

6127.32 port 1
6127.27 port 3

6006.84 port 1
6006.34 port 3

6134.83 port 1
6134.29 port 3


I guess it is possible for one of the netperfs or netservers to spin such that they preclude the other from running, even though I have four cores on the system. For additional grins I pinned each netperf/netserver to its own CPU, with the send_poll and recv_poll put back to defaults (unloaded and reloaded the ib_sdp module)

[EMAIL PROTECTED] netperf2_work]# for j in 1 2 3 4; do echo; for i in 1 3 ; do netperf -T $i -H 192.168.$i.107 -B "port $i" -l 60 -P 0 -v 0 -t SDP_RR -- -s 1M -S 1M -r 64K -b 12 & done;wait;done

10108.65 port 1
2187.80 port 3

7754.14 port 3
4541.81 port 1

7013.78 port 3
5282.01 port 1

6499.44 port 3
5796.42 port 1


And I still see this apparant starvation of one of the connections, although it isn't (overall) as bad as without the binding so I guess it isn't anything one can workaround via CPU binding trickery. Is this behaviour expected?

rick jones

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to