So, while playing around with my new netperf SDP_RR test I've noticed that a single-byte _RR test over SDP has a much higher transactions per second (ie lower latency) than over TCP over the same HCA, but the CPU utilization is _very_ much higher and the service demand (cpu per transaction) as well. CPU util being higher makes sense with a higher transaction rate, but not the increased service demand - well at least not to my experience thusfar.

[EMAIL PROTECTED] ~]# for i in SDP_RR TCP_RR; do netperf -t $i -l 60 -c -C -H 192.168.0.107;done SDP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.107 (192.168.0.107) port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local   remote
bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr

126976 126976 1       1      60.00   37868.61  28.02  27.65  29.598  29.210
126976 126976
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.107 (192.168.0.107) port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local   remote
bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr

87380  87380  1       1      60.00   19281.49  3.40   3.90   7.049   8.089
87380  87380

The systems here are running RHEL5:
[EMAIL PROTECTED] ~]# uname -a
Linux hpcpc106.cup.hp.com 2.6.18-8.el5 #1 SMP Fri Jan 26 14:16:09 EST 2007 ia64 ia64 ia64 GNU/Linux

and whatever bits come with that (this is not OFED 1.2 rc bits - I still don't know how to remove enough of what ships with RHEL5 to put all of OFED 1.2 (well, the modules I want) on there without conflict. I'm not sure how to check the versions - normally I'd use ethtool, but that doesn't work against an ibN device. Someone elsewhere suggested that the bits in RHEL5 might be OFED 1.1.

These systems have four real cores, and no HW threads enabled, so 25% CPU util means that the equivalent of an entire CPU core is being consumed.

Before I start trying to hit the system with a profiler I thought I would ask if this was expected with SDP. Normally a single-instance, single-byte _RR test between otherwise identical systems consumes at most 50% of a core ( a bit handwaving, but that has been my experience thusfar)

rick jones
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to