[ 
https://issues.apache.org/jira/browse/MESOS-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Downes updated MESOS-4105:
------------------------------
    Assignee: Cong Wang

> Network isolator causes corrupt packets to reach application
> ------------------------------------------------------------
>
>                 Key: MESOS-4105
>                 URL: https://issues.apache.org/jira/browse/MESOS-4105
>             Project: Mesos
>          Issue Type: Bug
>          Components: isolation
>    Affects Versions: 0.20.0, 0.20.1, 0.21.0, 0.21.1, 0.21.2, 0.22.0, 0.22.1, 
> 0.22.2, 0.23.0, 0.23.1, 0.24.0, 0.24.1, 0.25.0
>            Reporter: Ian Downes
>            Assignee: Cong Wang
>            Priority: Critical
>
> The optional network isolator (network/port_mapping) will let corrupt TCP 
> packets reach the application. This could lead to data corruption in 
> applications. Normally these packets are dropped immediately by the network 
> stack and do not reach the application. 
> Networks may have a very low level of corrupt packets (a few per million) or, 
> may have very high levels if there are hardware or software errors in 
> networking equipment.
> Investigation is ongoing but an initial hypothesis is being tested:
> 1) The checksum error is correctly detected by the host interface.
> 2) The Mesos tc filters used by the network isolator redirect the packet to 
> the virtual interface, even when a checksum error has occurred.
> 3) Either in copying to the veth device or passing across the veth pipe the 
> checksum flag is cleared.
> 4) The veth inside the container does not verify the checksum, even though 
> TCP RX checksum offloading is supposedly on. \[This is hypothesized to be 
> acceptable normally because it's receiving packets over the virtual link 
> where corruption should not occur\] 
> 5) The container network stack accepts the packet and delivers it to the 
> application.
> Disabling tcp rx cso on the container veth appears to fix this: it forces the 
> container network stack to compute the packet checksums (in software) whereby 
> it detects the checksum errors and does not deliver the packet to the 
> application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to