Hi Krishna,

Krishna Kumar2 wrote:
What I will do today/tomorrow is to run the rev5 (which I didn't run
for mthca) on both ehca and mthca and get statistics and send it out.
Otherwise what you stated is correct as far as rev4 goes. After giving
latest details, I will appreciate any help from Mellanox developers.

good, please test with rev5 and let us know.

Correct, for every 1 retransmission in the regular code, I see two
retransmissions in batching case (which I assume is due to overflow at the
receiver side as I batch sometimes upto 4K skbs). I will post the exact
numbers in the next post.

transmission of 4K batched packets sounds like a real problem for the receiver side, with 0.5K send/recv queue size, its 8 batches of 512 packets each were for each RX there is completion (WC) to process, SKB to alloc and post to the QP where for the TX there's only posting to the QP, processes one (?) WC and free 512 SKBs.

If indeed the situation is so unsymmetrical, I am starting to think that the CPU utilization at the sender side might be much higher with batching then without batching, have you looked into that?

I was using 2.6.23-rc1 on receiver (which also has NAPI, but uses the
old API - the same fn ipoib_poll()).

I am not with you. Looking on 2.6.22 and 2.6.23-rc5, for both their ipoib-NAPI mechanism is implemented through the function ipoib_poll being the polling api for the network stack etc, so what is the old API and where does this difference exist?

This is TCP (without No Delay), datagram mode, I didn't change mtu from
the default (is it 2K?).  Command is iperf with various options for different
test buffer-size/threads.

You might want to try something lighter such as iperf udp test, where a nice criteria would be to compare bandwidth AND packet loss between no-batching and batching. As for the MTU, the default is indeed 2K (2044) but its always to just know the facts, namely what was the mtu during the test.

Regarding id/etc, this is what dmesg has:

if you have user space libraries installed, load ib_uverbs and run the command ibv_devinfo, you will see all the infiniband devices on your system and for each its device id and firmware version. If not, you should be looking on

/sys/class/infiniband/$device/hca_type
and
/sys/class/infiniband/$device/fw_ver

Sep 16 22:49:26 elm3b39 kernel: eHCA Infiniband Device Driver (Rel.:
SVNEHCA_0023)

There are *fw* files for mthca0, but I don't see for ehca in /sys/class, so
I am not sure (since these are pci-e cards, nothing shows up in lspci -v).
What should I look for?

the above print seems to be from the ehca driver where you are talking on mthca0, which is quite confusing. If you want to be sure what hca is being used by the netdevice you are testing with (eg ib0) take a look on the directory /sys/class/net/$netdevice/device/

If you have hca which is not reported in lspci and/or in /sys/class/infinidand it sounds like you have a problem or you found a bug.

Or.

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to