Public bug reported:

A user reports that using an i40e with intel_iommu=on with the Xenial GA
kernel causes data corruption. Using the Xenial HWE kernel or an out-of-
tree driver more recent than the version shipped with Xenial solves the
issue.

[Impact]
Corrupted data is returned from the network card intermittently. This is often 
noticeable when using apt, as the checksums are verified. If often leads to 
failure of apt operations. When there are no checksums done, this could lead to 
silent data corruption.

[Fix]
This was fixed somewhere post-4.4. Testing identified b32bfa17246d ("i40e: Drop 
packet split receive routine") which is part of a broader refactor. Picking 
this patch alone is sufficient to fix the issue. My theory is that iommu 
exposes an issue in the packet split receive routine and so removing it is 
sufficient to prevent the problem from occurring.

[Test]
A user tested a Xenial 4.4 kernel with this patch applied and it fixed their 
issue - no data corruption was observed. (The test repeatedly deletes the apt 
cache and then does apt update.)

[Regression Potential]
It's a messy change inside i40e, so the risk is that i40e will be broken in 
some subtle way we haven't noticed, or have performance issues. None of these 
have been observed so far.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1802421

Title:
  Xenial: data corruption when using i40e with iommu

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  A user reports that using an i40e with intel_iommu=on with the Xenial
  GA kernel causes data corruption. Using the Xenial HWE kernel or an
  out-of-tree driver more recent than the version shipped with Xenial
  solves the issue.

  [Impact]
  Corrupted data is returned from the network card intermittently. This is 
often noticeable when using apt, as the checksums are verified. If often leads 
to failure of apt operations. When there are no checksums done, this could lead 
to silent data corruption.

  [Fix]
  This was fixed somewhere post-4.4. Testing identified b32bfa17246d ("i40e: 
Drop packet split receive routine") which is part of a broader refactor. 
Picking this patch alone is sufficient to fix the issue. My theory is that 
iommu exposes an issue in the packet split receive routine and so removing it 
is sufficient to prevent the problem from occurring.

  [Test]
  A user tested a Xenial 4.4 kernel with this patch applied and it fixed their 
issue - no data corruption was observed. (The test repeatedly deletes the apt 
cache and then does apt update.)

  [Regression Potential]
  It's a messy change inside i40e, so the risk is that i40e will be broken in 
some subtle way we haven't noticed, or have performance issues. None of these 
have been observed so far.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802421/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to