G'Day Trish,
On Tue, Mar 04, 2008 at 06:17:40PM -0800, Siobhan P. Lynch wrote:
> What I'm trying to do is get timestamps on echo packets going out and
> the timestamp on a reply coming in,
"ping -s" is supposed to provide RTT down to the microsecond, if that's
the information you need. If you meant reading timestamps closer to the
NIC rather than in ping/userland - then that's a great question! :)
I've done this before (I thought I blogged about it but I can't find
the entry) the suprising result was that for nearby hosts (<< 1 ms) the
times ping gives you are more to do with ping code and CPU thread
scheduling rather than network latency. I had cooked up an enhanced
version of ping which involved wrapping /usr/sbin/ping with a DTrace
script to do the actual measurements.
> very specifically, I know what the
> client thinks he's looking at, what I'm trying to prove is the issue is
> that he's putting 30 lbs of garbage in a 20 lb. bag (squeezing more data
> down a 144k IDSL line than they should be)
>
> The issue, without getting too much into it is that they are seeing
> extremely high icmp echo reply times through the box, they are blaiming
> it on the box itself, so what I'm trying to do is establish a baseline.
>
> I'm not looking for "the answer" from you, I'm looking for examples
> using DTrace on how to measure the exact times between two packets (one
> request, one reply, icmp or tcp ping) essentially RTT times.
Measuring such times (RTT, 1st byte, TCP connection, etc) is something
DTrace should do very well - and provide processed output such as time
distribution plots.
However the only stable network provider to cover networking is "mib",
and it doesn't provide arguments to allow us to match packets, among
other things. There is a plan (which some of us have been working on
today) to create stable network providers for IP, TCP, ICMP, etc for
customer observability. The following output is a demo from a prototype:
# ./tcp1bl.d
^C
1st Byte Latency (ns) 192.168.101.101
value ------------- Distribution ------------- count
32768 | 0
65536 |@@@@@@ 2
131072 |@@@@@ 1
262144 |@@@@@@@@@@@@@@@ 4
524288 |@@@@@@@@@@@ 3
1048576 |@@@@ 1
2097152 | 0
That script delivers TCP 1st byte latency by hostname. Hot stuff -
and desprately needed. But it isn't available yet - and when it is,
it will be OpenSolaris to start with.
So for now we'll use fbt - the dynamic kernel provider that lets us
see everything. The good news: we can answer those questions. The
bad news: it's hard (you'll learn more Solaris TCP/IP stack code that
you probably want to :), and your scripts may break from one Solaris
release to the next. Which is why we'd prefer stable providers...
DISCLAIMER: THE FOLLOWING FBT SCRIPTS ARE NOT GUARANTEED TO WORK OR BE
ACCURATE. PLEASE CHECK THE DATE ON THIS EMAIL - THE MORE MONTHS THAT
HAVE ELAPSED, THE GREATER THE CHANCE THESE WILL NOT WORK.
I'll paste several more steps than usual, which might be more useful
for people than the fbt scripts themselves. I want to emphasise that
DTracing networking using the fbt provider will require much digging,
experimentation, and code reading; you are unlikely to get an answer
in a hurry.
So we are interested in tracing icmp for accurate timestamps.
I'll start by grepping for icmp probes:
# dtrace -l | grep icmp
30097 fbt ipf frpr_icmp6 entry
30098 fbt ipf frpr_icmp6 return
30111 fbt ipf frpr_icmp entry
30112 fbt ipf frpr_icmp return
30199 fbt ipf fr_checkicmpmatchingstate entry
30200 fbt ipf fr_checkicmpmatchingstate return
30203 fbt ipf fr_checkicmp6matchingstate entry
30204 fbt ipf fr_checkicmp6matchingstate return
30251 fbt ipf nat_icmpquerytype4 entry
...
There were 359 probes listed - maybe too many to read through to start
with. Knowing how programmers name functions may lead to some luck:
# dtrace -l | egrep 'icmp_(send|receive|read|write|tx|rx)'
32796 mib ip icmp_send_redirect
icmpOutRedirects
55711 fbt ip icmp_send_redirect entry
55712 fbt ip icmp_send_redirect return
55935 fbt ip icmp_send_redirect_v6 entry
55936 fbt ip icmp_send_redirect_v6 return
but not in this case.
Now I'll try frequency counting with a known number of events. This is
a quiet workstation where there shouldn't be much other ICMP happening
to confuse the output:
# dtrace -Fn 'fbt::*icmp*:entry { @[probefunc] = count(); }' -c 'ping -s
192.168.1.108 56 7'
dtrace: description 'fbt::*icmp*:entry ' matched 100 probes
PING 192.168.1.108: 56 data bytes
64 bytes from phobos (192.168.1.108): icmp_seq=0. time=0.248 ms
64 bytes from phobos (192.168.1.108): icmp_seq=1. time=0.488 ms
64 bytes from phobos (192.168.1.108): icmp_seq=2. time=0.325 ms
64 bytes from phobos (192.168.1.108): icmp_seq=3. time=0.618 ms
64 bytes from phobos (192.168.1.108): icmp_seq=4. time=0.225 ms
64 bytes from phobos (192.168.1.108): icmp_seq=5. time=0.635 ms
64 bytes from phobos (192.168.1.108): icmp_seq=6. time=0.678 ms
----192.168.1.108 PING Statistics----
7 packets transmitted, 7 packets received, 0% packet loss
round-trip (ms) min/avg/max/stddev = 0.225/0.460/0.678/0.192
dtrace: pid 582501 has exited
icmp_bind 1
icmp_bind_proto 1
icmp_build_hdrs 1
icmp_capability_req 1
icmp_close 1
icmp_close_free 1
icmp_copy_info 1
icmp_open 1
icmp_openv4 1
icmp_bind_ack 2
icmp_bind_result 2
icmp_opt_set 3
icmp_opt_set_locked 3
icmp_wput_other 5
icmp_inbound 7
icmp_wput 12
icmp_input 14
I sent 7 pings, and was hoping for 7 to show up in the output. I got
icmp_inbound(), which sounds promising for the receive, and maybe
icmp_wput() for the send - although it fired 12 times instead of 7.
Lets try mib:
# dtrace -Fn 'mib::: { @[probename] = count(); }' -c 'ping -s 192.168.1.108 56
7'
dtrace: description 'mib::: ' matched 568 probes
PING 192.168.1.108: 56 data bytes
64 bytes from phobos (192.168.1.108): icmp_seq=0. time=0.765 ms
64 bytes from phobos (192.168.1.108): icmp_seq=1. time=0.420 ms
64 bytes from phobos (192.168.1.108): icmp_seq=2. time=0.697 ms
64 bytes from phobos (192.168.1.108): icmp_seq=3. time=0.668 ms
64 bytes from phobos (192.168.1.108): icmp_seq=4. time=0.185 ms
64 bytes from phobos (192.168.1.108): icmp_seq=5. time=0.678 ms
64 bytes from phobos (192.168.1.108): icmp_seq=6. time=0.852 ms
----192.168.1.108 PING Statistics----
7 packets transmitted, 7 packets received, 0% packet loss
round-trip (ms) min/avg/max/stddev = 0.185/0.609/0.852/0.229
dtrace: pid 582512 has exited
ipIfStatsForwProhibits 1
ipIfStatsHCInMcastOctets 1
ipIfStatsHCInMcastPkts 1
ipv6IfIcmpInMsgs 1
ipv6IfIcmpInNeighborAdvertisements 1
tcpOutAckDelayed 3
tcpOutAck 4
icmpInEchoReps 7
icmpInMsgs 7
rawipOutDatagrams 7
tcpInAckBytes 7
tcpInAckSegs 7
tcpOutDataBytes 7
tcpOutDataSegs 7
tcpRttUpdate 7
tcpInDataInorderBytes 9
tcpInDataInorderSegs 9
rawipInDatagrams 14
ipIfStatsHCInOctets 18
ipIfStatsHCInReceives 18
ipIfStatsHCOutOctets 18
ipIfStatsHCOutRequests 18
ipIfStatsHCOutTransmits 18
ipIfStatsHCInDelivers 24
better? I get icmpInMsgs and icmpInEchoReps for inbound, but still
no clear outbound. There is rawipOutDatagrams with a 7 count, which
sounds like it could be icmp.
Lets see who calls rawipOutDatagrams:
# dtrace -Fn 'mib:::rawipOutDatagrams { @[probefunc, stack()] = count(); }' -c
'ping -s 192.168.1.108 56 7'
dtrace: description 'mib:::rawipOutDatagrams ' matched 2 probes
PING 192.168.1.108: 56 data bytes
64 bytes from phobos (192.168.1.108): icmp_seq=0. time=1.794 ms
64 bytes from phobos (192.168.1.108): icmp_seq=1. time=0.462 ms
64 bytes from phobos (192.168.1.108): icmp_seq=2. time=0.457 ms
64 bytes from phobos (192.168.1.108): icmp_seq=3. time=0.457 ms
64 bytes from phobos (192.168.1.108): icmp_seq=4. time=0.468 ms
64 bytes from phobos (192.168.1.108): icmp_seq=5. time=0.447 ms
64 bytes from phobos (192.168.1.108): icmp_seq=6. time=0.458 ms
----192.168.1.108 PING Statistics----
7 packets transmitted, 7 packets received, 0% packet loss
round-trip (ms) min/avg/max/stddev = 0.447/0.649/1.794/0.505
dtrace: pid 582521 has exited
icmp_wput
unix`putnext+0x2f1
genunix`strput+0x1cf
genunix`kstrputmsg+0x2bf
sockfs`sosend_dgram+0x2dd
sockfs`sotpi_sendmsg+0x566
sockfs`sendit+0x1b8
sockfs`sendto+0xb8
sockfs`sendto32+0x2d
unix`sys_syscall32+0x1fc
7
The top level function was icmp_wput() - which sounds like our guy.
So far, we have the following possible icmp receive probes:
fbt::icmp_inbound:entry
mib:::icmpInMsgs
mib:::icmpInEchoReps
And the following possible icmp send probe:
mib::icmp_wput:rawipOutDatagrams
Now might be a good time to read the source - such as bringing up the
icmp_inbound() and icmp_wput() functions in cvs.opensolaris.org, and
checking how and when they bump the MIBs. Having taken a quick look,
they still look promising.
Now to get some timestamps:
# dtrace -n 'mib::icmp_wput:rawipOutDatagrams,mib:::icmpInEchoReps {
trace(timestamp); }' -c 'ping -s 192.168.1.108 56 7'
dtrace: description 'mib::icmp_wput:rawipOutDatagrams,mib:::icmpInEchoReps '
matched 2 probes
PING 192.168.1.108: 56 data bytes
64 bytes from phobos (192.168.1.108): icmp_seq=0. time=0.765 ms
CPU ID FUNCTION:NAME
0 33301 icmp_wput:rawipOutDatagrams 7698507169846581
1 33052 icmp_inbound:icmpInEchoReps 7698507170499718
1 33301 icmp_wput:rawipOutDatagrams 7698508170106314
1 33052 icmp_inbound:icmpInEchoReps 7698508170223800
64 bytes from phobos (192.168.1.108): icmp_seq=1. time=0.366 ms
0 33301 icmp_wput:rawipOutDatagrams 7698509170070724
1 33052 icmp_inbound:icmpInEchoReps 7698509170173117
64 bytes from phobos (192.168.1.108): icmp_seq=2. time=0.306 ms
0 33301 icmp_wput:rawipOutDatagrams 7698510170071133
1 33052 icmp_inbound:icmpInEchoReps 7698510170171805
64 bytes from phobos (192.168.1.108): icmp_seq=3. time=0.232 ms
64 bytes from phobos (192.168.1.108): icmp_seq=4. time=0.161 ms
1 33301 icmp_wput:rawipOutDatagrams 7698511170081670
1 33052 icmp_inbound:icmpInEchoReps 7698511170150712
64 bytes from phobos (192.168.1.108): icmp_seq=5. time=0.179 ms
1 33301 icmp_wput:rawipOutDatagrams 7698512170195437
1 33052 icmp_inbound:icmpInEchoReps 7698512170273791
1 33301 icmp_wput:rawipOutDatagrams 7698513170087531
64 bytes from phobos (192.168.1.108): icmp_seq=6. time=0.184 ms
----192.168.1.108 PING Statistics----
7 packets transmitted, 7 packets received, 0% packet loss
round-trip (ms) min/avg/max/stddev = 0.161/0.313/0.765/0.213
dtrace: pid 582528 has exited
1 33052 icmp_inbound:icmpInEchoReps 7698513170175988
Beware of the CPU column shuffling - the events may not printed in the
correct order (post sort those timestamps). So this is working, but
the output is rather rough.
So, cutting corners based on some assumptions - in particular that your
server is ICMP quiet and packet associations are not needed:
# cat ping.d
#!/usr/sbin/dtrace -s
#pragma D option quiet
mib::icmp_wput:rawipOutDatagrams
{
/*
* Assume your server is ICMP quiet, and no packet association
* is necessary.
*/
start = timestamp;
}
mib:::icmpInEchoReps
/start/
{
@["ICMP round-trip (ns)"] = quantize(timestamp - start);
start = 0;
}
# ./ping.d -c 'ping -s 192.168.1.108 56 7'
PING 192.168.1.108: 56 data bytes
64 bytes from phobos (192.168.1.108): icmp_seq=0. time=0.547 ms
64 bytes from phobos (192.168.1.108): icmp_seq=1. time=0.186 ms
64 bytes from phobos (192.168.1.108): icmp_seq=2. time=0.470 ms
64 bytes from phobos (192.168.1.108): icmp_seq=3. time=0.461 ms
64 bytes from phobos (192.168.1.108): icmp_seq=4. time=0.450 ms
64 bytes from phobos (192.168.1.108): icmp_seq=5. time=0.450 ms
64 bytes from phobos (192.168.1.108): icmp_seq=6. time=0.456 ms
----192.168.1.108 PING Statistics----
7 packets transmitted, 7 packets received, 0% packet loss
round-trip (ms) min/avg/max/stddev = 0.186/0.431/0.547/0.113
ICMP round-trip (ns)
value ------------- Distribution ------------- count
32768 | 0
65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 7
131072 | 0
Oh, that quantize is rather compressed... Lets make it an lquantize():
# cat ping.d
#!/usr/sbin/dtrace -s
#pragma D option quiet
mib::icmp_wput:rawipOutDatagrams
{
/*
* Assume your server is ICMP quiet, and no packet association
* is necessary.
*/
start = timestamp;
}
mib:::icmpInEchoReps
/start/
{
@["ICMP round-trip (ns)"] = lquantize(timestamp - start, 0, 1000000,
1000);
start = 0;
}
# ./ping.d -c 'ping -s 192.168.1.108 56 7'
PING 192.168.1.108: 56 data bytes
64 bytes from phobos (192.168.1.108): icmp_seq=0. time=0.526 ms
64 bytes from phobos (192.168.1.108): icmp_seq=1. time=0.189 ms
64 bytes from phobos (192.168.1.108): icmp_seq=2. time=0.457 ms
64 bytes from phobos (192.168.1.108): icmp_seq=3. time=0.460 ms
64 bytes from phobos (192.168.1.108): icmp_seq=4. time=0.444 ms
64 bytes from phobos (192.168.1.108): icmp_seq=5. time=0.474 ms
64 bytes from phobos (192.168.1.108): icmp_seq=6. time=0.463 ms
----192.168.1.108 PING Statistics----
7 packets transmitted, 7 packets received, 0% packet loss
round-trip (ms) min/avg/max/stddev = 0.189/0.430/0.526/0.110
ICMP round-trip (ns)
value ------------- Distribution ------------- count
89000 | 0
90000 |@@@@@@ 1
91000 |@@@@@@@@@@@ 2
92000 | 0
93000 |@@@@@@ 1
94000 | 0
95000 | 0
96000 | 0
97000 | 0
98000 |@@@@@@ 1
99000 | 0
100000 |@@@@@@ 1
101000 |@@@@@@ 1
102000 | 0
Looking good. Note that the measured times are between 0.090 and 0.102 ms,
unlike the summary from ping - which has an average of 0.430 ms!
That script is a bit junky, the follwing excercises may illustrate why...
Excercises for the reader:
1) Read the source behind those probes - are they measuring what
we assumed them to measure?
2) associate packets properly by digging out IP and ICMP fields
(difficult, and would currently require fbt probes, not mib probes).
3) Can fbt probes be picked that are closer in the code path to the
interface, to improve the accuracy of the time measurements?
(also difficult).
4) mib::icmp_wput:rawipOutDatagrams is unstable, since it specified
the probefunc field. Is there a better mib probe, or combination of
mib probes that can be used to make this script stable?
(this would exclude tasks (2) and (3)).
...
This is by no means an authorative answer on the topic, and covers
unstable interfaces that are unlikely to work in the future. But it is
an exciting topic for DTrace, and once we have stable network providers
we'll be able to really get started writing some cool and useful
scripts...
Brendan
--
Brendan
[CA, USA]
_______________________________________________
dtrace-discuss mailing list
[email protected]