Re: Did some extensive hipersocket testing/benchmarking.... need help interpreting results.

James Tison Fri, 28 May 2004 12:05:04 -0700

I fear I'm gonna start a flame war here, but here goes anyway.
WARNING: I am NOT speaking for IBM here ... this is all personal
experience. I also want to apologize in advance for the length
of this post.


Network bandwidth is measured point-to-point. Leland's test using
netpipe is probably the best indicator of what this _theoretical_
maximum is. Network bandwidth is always stated _independent_ of
IP stack processing time and other interference factors. Gigabit
Ethernet means that the maximum transmission rate that the NIC
can handle is 1000 Gbps (that's "bits", or 125 MB[ytes]ps). Like
any other I/O method, you can't go any faster than your most
restrictive bottleneck will allow.

FTP is a __TERRIBLE__ benchmark. Remember that besides network
bandwidth, other factors that might introduce latency are things
like OS scheduling delays/timeslicing on both client and server
ends, disk I/O delays on both ends, etc; _not_to_mention_ TCP
SYN/ACK traffic, possible retransmissions, packet fragmentation
and reassembly, etc.

A better but much more complicated way to test if you can't
afford (or author) benchmarking software....

The two peers you test with have _got_ to have ZERO routing hops
between them: routing can also introduce latency; but with
hipersockets, this isn't supposed to be an issue. You may not
think that a typical LAN routing latency of 4-6 ms is a big deal,
but the more packets that get sent (and ACKed), the more _any_
latency will hurt your final net throughput value. We're talking
about possibly thousands of packets, and nK * X ms = seconds,
at least.

The fastest way to the IP stack that I know of without going
raw is ICMP (ping). You can adjust the ping packet size to a
value close to and below the lowest MTU size of each of the two
peers (you don't want fragmentation involved, either, and IIRC,
even large ICMP packets will fragment), then set "flood ping" mode,
and let 'er rip. Some peers will refuse to run "flood" mode ...
the VM TCPIP peer I have a PPP connection with inserts a 200 ms
interval, so you cannot use the total time reported at the end,
but the averages work. In these cases, you need to run an "adaptive"
ping (-A) instead of a flood (-f).

Here's how this works: I figure the maximum packet size to be
the MTU - 20 (IP header) - 8 (ICMP header) - 8 (ICMP_ECHO_REQUEST).
This works out to 1456 for me.

Here's what I get with my VCTC connected VM TCPIP peer, with my
MTU set at 1492 on a brand-spanking new z/990 running z/VM 4.4:
================================================================
ping -A -c 100 -s 1456 9.10.11.12
< snip >
--- 9.10.11.12 ping statistics ---
100 packets transmitted, 100 received, 0% loss, time 20788ms
rtt min/avg/max/mdev = 0.159/0.224/0.344/0.043 ms
================================================================

I can't use "time 20788 ms" as reported: there's 200 ms of interval
inserted by the peer for every ping. So instead, I can calculate
total time as rtt avg (0.224 ms) times the packet count (100),
to yield 22.4 _milli_seconds, or 0.0224 seconds. Since this is
round trip (rtt), I can divide this by 2, yielding 0.0112 seconds.
The total data transmitted was 1492 bytes X packet count (100),
yielding 149,200 bytes. The effective bandwidth calculation now works
out to

      149,200
      ------- = 13,321,428 B/sec
       0.0112

(note the big 'B': that's "bytes", not "bits". Little 'b' is
supposed to mean 'bits', but this convention isn't always
followed)

    13,321,428 / (1024 X 1024) = 12.70 MB/s

This is a NET figure, and it really doesn't look stellar, does
it? IP stack time in and out is still being measured. So we repeat
this test with a minimum packet data size (-s 16, yielding a total
packet size of 52 bytes), and we get:

rtt min/avg/max/mdev = 0.128/0.196/0.456/0.061 ms

That's

       5,200
      ------ = 518 KB/s
      0.0098

Big difference, huh? But if we look at these measurements out of
context, we're gonna scratch our heads, much like the first
inquirer did. We need to look at them _comparatively_ to figure
out what the _delta_ is between the two transmission rates. That
is, (Where Blarge = bytes xferred in large packet test, Bsmall =
bytes transferred in small packet test, Tlarge = time consumed
in large packet test, and Tsmall in small packet test), which
measures the "effective bandwidth", or the rate at which
_additional_ data is transferred, effectively removing the IP
stack interference factor:

      Blarge - Bsmall
      --------------- = Absolute bandwidth
      Tlarge - Tsmall

Which yields, in my case:

      149,200 - 5,200
      ---------------
      0.0112 - 0.0098

      144,000
      ------- = 102,857,142 B/sec
       0.0014

   ... or 98 MB/sec (or 784 Mbps, however you wanna look at it).

"Absolute bandwidth" is the actual transmission speed, "in-pipe". It
can be reduced by other traffic sharing that same pipe with you
as you test. I just ran this test a few minutes ago, in the midst of
a production day, which isn't an accurate or wise thing to do. You
wanna catch both peers (in this case, VM and the Linux guest) unloaded,
like in the middle of the night.

The specific numbers I'm reporting don't really mean anything: I just
wanted to show how to __really__ measure bandwidth, and not lean on
something completely misleading like FTP. I'm sure some math PhD out
there (I'm not one) will tear this apart, but it's close enough for
government work :-)

"Gigabit Ethernet" never meant you'd get 125 MB/s in FTP or NFS. Ever.
It means that the net "pipe" can handle at maximum 125 MB/s in or out.
Actual transmission speeds, well, YMMV; as you especially see with z/OS.
Effective bandwidth (what happens what you put two TCP peers together
and see how much data they can transfer) is MUCH lower: always.

As far as I know, ping would be the best "poor man's net bandwidth
benchmark". Otherwise, specialized benchmark software and a carefully
controlled peer pair would be required to measure. I think Stevens
discusses using ping to determine bandwidth in "TCP/IP Illustrated", IIRC.

Regards,
--Jim--
James S. Tison
Senior Software Engineer
TPF Laboratory / Architecture
IBM Corporation
"If dogs don't go to heaven, then, when I die, I want to go where they
do."
   -- Will Rogers



"Lucius, Leland" <[EMAIL PROTECTED]>
Sent by: Linux on 390 Port <[EMAIL PROTECTED]>
05/28/2004 12:40
Please respond to
Linux on 390 Port


To
[EMAIL PROTECTED]
cc

Subject
Re: Did some extensive hipersocket testing/benchmarking.... need help
interpreting results.






>
> 125 Storing data set /it/public/Su810_001.iso
> 100% |*************************************|   595 MB    2.68
> MB/s    --:--
> ETA
> 250 Transfer completed successfully.
> 624885855 bytes sent in 03:41 (2.68 MB/s)
>
Depressing isn't it?  I've never been able to get much (or any) better
than
what you have.  I don't know why either.  The only thing that's kept me
from
goin' round the twist was a little program called netpipe.  A completely
synthetic benchmark program, but it at least showed me that the
hipersockets
CAN acheive high throughput.  I just don't know how to make it happen for
FTP between Linux and z/OS.

Here's the best rate I got from a run I just did:

  7:     39936 bytes 1010 times -->  1274.84 Mbps in 0.000239 sec, avg
utime=0.000000 avg stime=0.000226, min utime=0.000000 stime=0.000198,
utime
var=0.000000 stime var=0.000000

If you find out, I sure would appreciate it if you'd clue me in on the
secret.

Leland


CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may
contain proprietary and privileged information for the use of the
designated
recipients named above. Any unauthorized review, use, disclosure or
distribution is prohibited. If you are not the intended recipient, please
contact the sender by reply e-mail and destroy all copies of the original
message.

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or
visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: Did some extensive hipersocket testing/benchmarking.... need help interpreting results.

Reply via email to