Hi. Do you use the latest released FW for this device?
thanks Dotan Marcel Heinz wrote:
Hi, I have ported an application to use InfiniBand multicast directly via libibverbs. I have discovered very low multicast throughput, only ~250MByte/s although we are using 4x DDR components. To count out any effects of the application, I've created a small benchmark (well, it's only a hack). It just tries to keep the send/recv queue filled with work request and polls the CQ in an endless loop. In server mode, it joins to/creates the multicast group as FullMember, attaches the QP to the group and receives any packets. The client joins as SendOnlyNonMember and sends Datagrams of full MTU size to the group. The test setup is as follows: Host A <---> Switch <---> Host B We use Mellanox InfiniHost III Lx HCAs (MT25204) and a Flextronics F-X430046 24-Port Switch, OFED 1.3 and a "vanilla" 2.6.23.9 Linux kernel. The results are: Host A Host B Throughput (MByte/sec) client server 262 client 2xserver 146 client+server server 944 client+server --- 946 as reference: unicast ib_send_bw (in UD mode): 1146 I don't see any reason why it should become _faster_ when I additionally start a server on the same host as the client. OTOH, the 944MByte/s sound relatively sane when compared to the unicast performance with the additional overhead of having to copy the data locally. These 260MB/s seem releatively near to the 2GBit/s effective throughput of a 1x SDR connection. However, the created group is rate 6 (20GBit/s) and /sys/class/infiniband/mthca0/ports/1/rate file showed 20 Gb/sec during the whole test. The error counters of all ports are showing nothing abnormal. Only the RcvSwRelayErrors counter of the switch's port (to the host running the client) is increasing very fast, but this seems to be normal for multicast packets, as the switch is not relaying these packets back to the source. We could test on another cluster with 6 nodes (also with MT25204 HCAs, I don't know the OFED version and switch type) and got the following results: Host1 Host2 Host3 Host4 Host5 Host6 Throughput (MByte/s) 1s 1s 1c 255,15 1s 1s 1s 1c 255,22 1s 1s 1s 1s 1c 255,22 1s 1s 1s 1s 1s 1c 255,22 1s1c 1s 1s 738,64 1s1c 1s 1s 1s 695,08 1s1c 1s 1s 1s 1s 565,14 1s1c 1s 1s 1s 1s 1s 451,90 As long as there is no server and client on the same host, it at least behaves like multicast. When having both client and server on the same host, performance decreases as the number of servers increases, which is totally surprising to me. Another test I did was doing a ib_send_bw (UD) benchmark while the multicast benchmark was running between A and B. I got ~260MByte/s for the multicast and also 260MB/s for ib_send_bw. Has anyone an idea of what is going on there or a hint what I should check? Regards, Marcel _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
_______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
