[
https://issues.apache.org/jira/browse/MESOS-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ian Downes updated MESOS-4105:
------------------------------
Assignee: Cong Wang
> Network isolator causes corrupt packets to reach application
> ------------------------------------------------------------
>
> Key: MESOS-4105
> URL: https://issues.apache.org/jira/browse/MESOS-4105
> Project: Mesos
> Issue Type: Bug
> Components: isolation
> Affects Versions: 0.20.0, 0.20.1, 0.21.0, 0.21.1, 0.21.2, 0.22.0, 0.22.1,
> 0.22.2, 0.23.0, 0.23.1, 0.24.0, 0.24.1, 0.25.0
> Reporter: Ian Downes
> Assignee: Cong Wang
> Priority: Critical
>
> The optional network isolator (network/port_mapping) will let corrupt TCP
> packets reach the application. This could lead to data corruption in
> applications. Normally these packets are dropped immediately by the network
> stack and do not reach the application.
> Networks may have a very low level of corrupt packets (a few per million) or,
> may have very high levels if there are hardware or software errors in
> networking equipment.
> Investigation is ongoing but an initial hypothesis is being tested:
> 1) The checksum error is correctly detected by the host interface.
> 2) The Mesos tc filters used by the network isolator redirect the packet to
> the virtual interface, even when a checksum error has occurred.
> 3) Either in copying to the veth device or passing across the veth pipe the
> checksum flag is cleared.
> 4) The veth inside the container does not verify the checksum, even though
> TCP RX checksum offloading is supposedly on. \[This is hypothesized to be
> acceptable normally because it's receiving packets over the virtual link
> where corruption should not occur\]
> 5) The container network stack accepts the packet and delivers it to the
> application.
> Disabling tcp rx cso on the container veth appears to fix this: it forces the
> container network stack to compute the packet checksums (in software) whereby
> it detects the checksum errors and does not deliver the packet to the
> application.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)