This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
artful' to 'verification-done-artful'. If the problem still exists,
change the tag 'verification-needed-artful' to 'verification-failed-
artful'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723127

Title:
  Intel i40e PF reset due to incorrect MDD detection (continues...)

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Won't Fix
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Artful:
  Fix Committed
Status in linux source package in Bionic:
  Fix Released

Bug description:
  [impact]

  The i40e driver sometimes causes a "malicious device" event that the
  firmware detects, which causes the firmware to reset the nic, causing
  an interruption in the network connection - which can cause further
  problems, e.g. if the interface is in a bond; the reset will at least
  cause a temporary interruption in network traffic.

  [fix]

  The upstream patch to fix this adjusts how the driver fragments TX
  data; the "malicious driver" detected by the firmware is a result of
  incorrectly crafted TX fragment descriptors (the firmware has specific
  complicated restrictions on this).  The patch is from Intel, and they
  suggested this specific patch to address the problem; additionally I
  have checked with someone who reported this to me and provided a test
  kernel with the patch to them, and they have been able to run ~6 weeks
  so far without reproducing the issue; previously they could reproduce
  it as quickly as a day, but usually within 2-3 weeks.

  [test case]

  the bug is unfortunately very difficult to reproduce, but as shown in
  this (and previous) bug comments, some users of the i40e have traffic
  that can consistently reproduce the problem (although usually on the
  order of days, or longer, to reproduce).  Reproducing is easily
  detected, as the nw traffic will be interrupted and the system logs
  will contain a message like:

  i40e 0000:02:00.1: TX driver issue detected, PF reset issued

  [regression potential]

  the patch for this alters how tx is fragmented by the driver, so a
  possible regression would likely cause problems in TX traffic and/or
  additional "malicious device detection" events.


  [original description]

  This is a continuation from bug 1713553; a patch was added in that bug
  to attempt to fix this, and it may have helped reduce the issue but
  appears not to have fixed it, based on more reports.

  The issue is the i40e driver, when TSO is enabled, sometimes sees the
  NIC firmware issue a "MDD event" where MDD is "Malicious Driver
  Detection".  This is vaguely defined in the i40e spec, but with no way
  to tell what the NIC actually saw that it didn't like.  So, the driver
  can do nothing but print an error message and reset the PF (or VF).
  Unfortunately, this resets the interface, which causes an interruption
  in network traffic flow while the PF is resetting.

  See bug 1713553 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to