James,

First, you have determined that the large packet size is working between
the two systems.  What appears to be happening on the SLES9 is that once
you trigger the ICMP packet too large response, the TCP/IP stack is
dynamically lowering the reported MTU to 32000.  Once communication with
a smaller packet occurs, the MTU returns to the full size as evidenced
by the increasing sizes succeeding until again you again try with
packets exceeding the actually configured MTU.  It appears as though the
IP stack is trying to say to the upper layer protocol (in this case the
ping program), 'Hey dummy your packets are too big.  I told you the real
size and you did not listen, so here is an even smaller size' in an
attempt to get ping to present packets small enough to succeed.  Ping of
course only sends what you told it to send so it doesn't listen.  TCP of
course would listen.

If I recall correctly, this is a new Linux instance which you are
testing and has not gone into production.  If I were in your shoes,
knowing what you know, I would attempt to run some production type work,
some large FTP's for example and see if they run like they due on the
SLES8 system.  MTU issues usually reflect themselves in time (or
possibly increased CPU cycles).  You have obviously uncovered different
and somewhat odd behavior but will it actually negatively effect
production is unclear.

The data from these tests are also valuable in demonstrating the
phenomenon if you want to pursue it through Novell support.  Someone who
understands the actual kernal IP stack code and what it is trying to do
is needed to provide a definitive explanation.  My interpretation above
is just that.  Rather than focusing on these results, I recommend
focusing on finding out if anything is actually "broke."  My hunch is
that it isn't, but you need to build some confidence that is the case.

Harold Grovesteen

James Melin wrote:

I went and ran the ping test to see how SLES8 vs SLES9 would behave given the 
tracepath differences I've found.
Some background to just refresh:

Hipersocket interface is set to 64 K on the CHPID. There is 8K of overhead so 
the MTU is really 57344, and that is what the interface sets itself to
at IPL time. I have tried hard coding MTU and letting the interface discover 
MTU by itself, the results are the same.

--------------- SNIP-------------

In your case, you should succeed on the second try and the MTU would
immediately drop to 1492. So, something else is going on with the
tracepath program. Without a packet trace of what is actually going on,
as seen by the network and the program, it is difficult to figure out
what is really occurring. In this case the problem appears to be in the
host from which the trace route is issued, SLES9.

There are two questions: Is the MTU really 1492 and why is tracepath
acting as it is? The first is of the most concern.

I would recommend using a ping with a payload size of 57344 and the
do-not-fragment option enabled. If this does not work, then remove the
do-not-fragment option and see if it succeeds

----------- SNIP -------------

The following is length and I apologize for that but I think it's relevant 
because of the differences between SLES8 and SLES9 in the ping response
when packet sizes are set.

Results of ping testing with packet sizes specified - SLES 8 system.

One byte shy of MTU. Works as expected

nokomis:~ #  ping -c 5 -M do -s 57315 192.168.252.1
PING 192.168.252.1 (192.168.252.1) from 192.168.252.16 : 57315(57343) bytes of 
data.
57323 bytes from 192.168.252.1: icmp_seq=1 ttl=64 time=46.7 ms
57323 bytes from 192.168.252.1: icmp_seq=2 ttl=64 time=44.6 ms
57323 bytes from 192.168.252.1: icmp_seq=3 ttl=64 time=1.13 ms
57323 bytes from 192.168.252.1: icmp_seq=4 ttl=64 time=0.906 ms
57323 bytes from 192.168.252.1: icmp_seq=5 ttl=64 time=1.93 ms

--- 192.168.252.1 ping statistics ---
5 packets transmitted, 5 received, 0% loss, time 4036ms
rtt min/avg/max/mdev = 0.906/19.075/46.751/21.754 ms

Packet size = MTU - works as expected.

nokomis:~ #  ping -c 5 -M do -s 57316 192.168.252.1
PING 192.168.252.1 (192.168.252.1) from 192.168.252.16 : 57316(57344) bytes of 
data.
57324 bytes from 192.168.252.1: icmp_seq=1 ttl=64 time=2.77 ms
57324 bytes from 192.168.252.1: icmp_seq=2 ttl=64 time=3.87 ms
57324 bytes from 192.168.252.1: icmp_seq=3 ttl=64 time=0.991 ms
57324 bytes from 192.168.252.1: icmp_seq=4 ttl=64 time=0.807 ms
57324 bytes from 192.168.252.1: icmp_seq=5 ttl=64 time=17.7 ms

Exceed packet size, Failure reported - works as expected.

--- 192.168.252.1 ping statistics ---
5 packets transmitted, 5 received, 0% loss, time 4043ms
rtt min/avg/max/mdev = 0.807/5.235/17.737/6.354 ms
nokomis:~ #  ping -c 5 -M do -s 57317 192.168.252.1
PING 192.168.252.1 (192.168.252.1) from 192.168.252.16 : 57317(57345) bytes of 
data.
ping: local error: Message too long, mtu=57344
ping: local error: Message too long, mtu=57344
ping: local error: Message too long, mtu=57344
ping: local error: Message too long, mtu=57344
ping: local error: Message too long, mtu=57344


As can be seen, under SLES8, it eventually reports I've exceeded the MTU size.

Under SLES9, the behaviour is different:

Deliberately exceeded Buffer size - Warning as expected.

abinodji:~ # ping -c 5 -M do -s 57320 192.168.252.1
PING 192.168.252.1 (192.168.252.1) 57320(57348) bytes of data.
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 57344)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 57344)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 57344)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 57344)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 57344)

--- 192.168.252.1 ping statistics ---
0 packets transmitted, 0 received, +5 errors

Made sure the resulting packet would not be OVER the MTU. Works ok.

abinodji:~ # ping -c 5 -M do -s 57310 192.168.252.1
PING 192.168.252.1 (192.168.252.1) 57310(57338) bytes of data.
57318 bytes from 192.168.252.1: icmp_seq=1 ttl=64 time=0.907 ms
57318 bytes from 192.168.252.1: icmp_seq=2 ttl=64 time=1.08 ms
57318 bytes from 192.168.252.1: icmp_seq=3 ttl=64 time=0.891 ms
57318 bytes from 192.168.252.1: icmp_seq=4 ttl=64 time=1.51 ms
57318 bytes from 192.168.252.1: icmp_seq=5 ttl=64 time=1.96 ms

--- 192.168.252.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4633ms
rtt min/avg/max/mdev = 0.891/1.271/1.961/0.411 ms

2nd deliberately too large request throws same error. Expected.

abinodji:~ # ping -c 5 -M do -s 57318 192.168.252.1
PING 192.168.252.1 (192.168.252.1) 57318(57346) bytes of data.
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 57344)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 57344)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 57344)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 57344)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 57344)

--- 192.168.252.1 ping statistics ---
0 packets transmitted, 0 received, +5 errors

Set a packet size that results in a packet equalling the MTU and it fails, 
reporting the MTU is now 32000!

abinodji:~ # ping -c 5 -M do -s 57316 192.168.252.1
PING 192.168.252.1 (192.168.252.1) 57316(57344) bytes of data.
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)

--- 192.168.252.1 ping statistics ---
0 packets transmitted, 0 received, +5 errors

Set a packet size that results in a packet that is two bytes smaller than the 
MTU specified on the CHPID, and the ifconfig command still indicates is
57344
MTU returned as 32000

abinodji:~ # ping -c 5 -M do -s 57314 192.168.252.1
PING 192.168.252.1 (192.168.252.1) 57314(57342) bytes of data.
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)

--- 192.168.252.1 ping statistics ---
0 packets transmitted, 0 received, +5 errors

Again, same result even smaller packet.

abinodji:~ # ping -c 5 -M do -s 57312 192.168.252.1
PING 192.168.252.1 (192.168.252.1) 57312(57340) bytes of data.
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)

--- 192.168.252.1 ping statistics ---
0 packets transmitted, 0 received, +5 errors

31000 packet size works. No complaint

abinodji:~ # ping -c 5 -M do -s 31000 192.168.252.1
PING 192.168.252.1 (192.168.252.1) 31000(31028) bytes of data.
31008 bytes from 192.168.252.1: icmp_seq=1 ttl=64 time=1.49 ms
31008 bytes from 192.168.252.1: icmp_seq=2 ttl=64 time=4.88 ms
31008 bytes from 192.168.252.1: icmp_seq=3 ttl=64 time=0.821 ms
31008 bytes from 192.168.252.1: icmp_seq=4 ttl=64 time=0.926 ms
31008 bytes from 192.168.252.1: icmp_seq=5 ttl=64 time=0.620 ms

--- 192.168.252.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4046ms
rtt min/avg/max/mdev = 0.620/1.747/4.881/1.593 ms

Deliberately exceeding the aforementioned 32000 MTU by 27 bytes works!

abinodji:~ # ping -c 5 -M do -s 31999 192.168.252.1
PING 192.168.252.1 (192.168.252.1) 31999(32027) bytes of data.
32007 bytes from 192.168.252.1: icmp_seq=1 ttl=64 time=0.723 ms
32007 bytes from 192.168.252.1: icmp_seq=2 ttl=64 time=0.581 ms
32007 bytes from 192.168.252.1: icmp_seq=3 ttl=64 time=0.984 ms
32007 bytes from 192.168.252.1: icmp_seq=4 ttl=64 time=0.731 ms
32007 bytes from 192.168.252.1: icmp_seq=5 ttl=64 time=1.79 ms

--- 192.168.252.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4035ms
rtt min/avg/max/mdev = 0.651/0.955/1.308/0.263 ms

Increase more.

abinodji:~ # ping -c 5 -M do -s 33000 192.168.252.1
PING 192.168.252.1 (192.168.252.1) 33000(33028) bytes of data.
33008 bytes from 192.168.252.1: icmp_seq=1 ttl=64 time=2.30 ms
33008 bytes from 192.168.252.1: icmp_seq=2 ttl=64 time=0.703 ms
33008 bytes from 192.168.252.1: icmp_seq=3 ttl=64 time=10.9 ms
33008 bytes from 192.168.252.1: icmp_seq=4 ttl=64 time=0.788 ms
33008 bytes from 192.168.252.1: icmp_seq=5 ttl=64 time=1.25 ms

--- 192.168.252.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 5232ms
rtt min/avg/max/mdev = 0.703/3.207/10.980/3.928 ms

increase more still.

abinodji:~ # ping -c 5 -M do -s 56000 192.168.252.1
PING 192.168.252.1 (192.168.252.1) 56000(56028) bytes of data.
56008 bytes from 192.168.252.1: icmp_seq=1 ttl=64 time=1.02 ms
56008 bytes from 192.168.252.1: icmp_seq=2 ttl=64 time=0.856 ms
56008 bytes from 192.168.252.1: icmp_seq=3 ttl=64 time=1.32 ms
56008 bytes from 192.168.252.1: icmp_seq=4 ttl=64 time=1.28 ms
56008 bytes from 192.168.252.1: icmp_seq=5 ttl=64 time=0.893 ms


Increase again

--- 192.168.252.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4052ms
rtt min/avg/max/mdev = 1.019/4.823/18.723/6.959 ms
abinodji:~ # ping -c 5 -M do -s 57300 192.168.252.1
PING 192.168.252.1 (192.168.252.1) 57300(57328) bytes of data.
57308 bytes from 192.168.252.1: icmp_seq=1 ttl=64 time=0.846 ms
57308 bytes from 192.168.252.1: icmp_seq=2 ttl=64 time=0.813 ms
57308 bytes from 192.168.252.1: icmp_seq=3 ttl=64 time=0.839 ms
57308 bytes from 192.168.252.1: icmp_seq=4 ttl=64 time=0.897 ms
57308 bytes from 192.168.252.1: icmp_seq=5 ttl=64 time=0.902 ms

--- 192.168.252.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4037ms
rtt min/avg/max/mdev = 0.813/0.859/0.902/0.043 ms

Increase again past max MTU - complaint as expected.

abinodji:~ # ping -c 5 -M do -s 57340 192.168.252.1
PING 192.168.252.1 (192.168.252.1) 57340(57368) bytes of data.
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 57344)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 57344)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 57344)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 57344)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 57344)

--- 192.168.252.1 ping statistics ---
0 packets transmitted, 0 received, +5 errors

Packet size = MTU, the 32000 stuff appears.

abinodji:~ # ping -c 5 -M do -s 57316 192.168.252.1
PING 192.168.252.1 (192.168.252.1) 57316(57344) bytes of data.
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)

--- 192.168.252.1 ping statistics ---
0 packets transmitted, 0 received, +5 errors

Should be under 57344 MTU but this complains that it's now 32000 also

abinodji:~ # ping -c 5 -M do -s 57315 192.168.252.1
PING 192.168.252.1 (192.168.252.1) 57315(57343) bytes of data.
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)
From 192.168.252.22 icmp_seq=1 Frag needed and DF set (mtu = 32000)

--- 192.168.252.1 ping statistics ---
0 packets transmitted, 0 received, +5 errors

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390




----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Reply via email to