Ian Downes created MESOS-4105:
---------------------------------
Summary: Network isolator causes corrupt packets to reach
application
Key: MESOS-4105
URL: https://issues.apache.org/jira/browse/MESOS-4105
Project: Mesos
Issue Type: Bug
Components: isolation
Affects Versions: 0.25.0, 0.24.1, 0.24.0, 0.23.1, 0.23.0, 0.22.2, 0.22.1,
0.22.0, 0.21.2, 0.21.1, 0.21.0, 0.20.1, 0.20.0
Reporter: Ian Downes
Priority: Critical
The optional network isolator (network/port_mapping) will let corrupt TCP
packets reach the application. This could lead to data corruption in
applications. Normally these packets are dropped immediately by the network
stack and do not reach the application.
Networks may have a very low level of corrupt packets (a few per million) or,
may have very high levels if there are hardware or software errors in
networking equipment.
Investigation is ongoing but an initial hypothesis is being tested:
1) The checksum error is correctly detected by the host interface.
2) The Mesos tc filters used by the network isolator redirect the packet to the
virtual interface, even when a checksum error has occurred.
3) Either in copying to the veth device or passing across the veth pipe the
checksum flag is cleared.
4) The veth inside the container does not verify the checksum, even though TCP
RX checksum offloading is supposedly on. \[This is hypothesized to be
acceptable normally because it's receiving packets over the virtual link where
corruption should not occur\]
5) The container network stack accepts the packet and delivers it to the
application.
Disabling tcp rx cso on the container veth appears to fix this: it forces the
container network stack to compute the packet checksums (in software) whereby
it detects the checksum errors and does not deliver the packet to the
application.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)