We’ve identified an issue in OpenShift Enterprise 3.1.1.6 and Origin 1.1.1
that causes SDN interfaces to have incorrect MTU values set on start up
which results in timeouts for SDN traffic. We’ve seen this cause issues
with deployments, metrics and the integrated registry to name a few. The
incorrect MTU values will cause problems in any environment configured to
use jumbo frames (mtu 9001), this includes cloud deployments such as AWS. A
fix [1][2] will be released soon but we wanted to give some background on
the issue in case you run into timeouts and can’t figure out what is
causing them.

The MTU value of SDN interfaces must be lower than the MTU value of eth0
(or the default route interface) by about 50 bytes to account for protocol
overhead.  If, however, the MTU is significantly lower, large packets
entering the SDN bridge from the node (either via eth0, from local node
operations, or from plain Docker containers attached to lbr0) may be
dropped by the kernel due to the MTU mismatch, causing the timeouts.

In this example, tun0 and lbr0 interfaces have an MTU of 1500 which is
significantly lower than the 9001 MTU of eth0.  Thus, packets larger than
1500 bytes entering the SDN bridge through tun0 may be dropped by the
kernel.

# ifconfig | grep mtu
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
lbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
tun0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
veth5018921: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 8951
veth148ab5c: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 8951
veth174f017: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 8951
veth255eb53: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 8951
veth26775cb: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 8951

The following settings may be added to
/usr/lib/systemd/system/atomic-openshift-node.service to correctly set MTUs
on atomic-openshift-node start as a workaround. Adjust the MTU values as
needed, reload systemd configuration and restart atomic-openshift-node to
take effect.

[Service]

...

ExecStartPost=/usr/bin/sleep 10
ExecStartPost=/usr/sbin/ip link set dev vovsbr mtu 8951
ExecStartPost=/usr/sbin/ip link set dev vlinuxbr mtu 8951

ExecStartPost=/usr/sbin/ip link set dev tun0 mtu 8951

[1] https://github.com/openshift/openshift-sdn/pull/258

[2] https://github.com/openshift/openshift-sdn/pull/261
_______________________________________________
dev mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev

Reply via email to