This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
xenial' to 'verification-done-xenial'. If the problem still exists,
change the tag 'verification-needed-xenial' to 'verification-failed-
xenial'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-artful

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1715519

Title:
  bnx2x_attn_int_deasserted3:4323 MC assert!

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Fix Committed
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Artful:
  Fix Committed

Bug description:
  SRU Justification
  =================

  A ppc64le system runs as a guest under PowerVM. This guest has a bnx2x
  card attached, and uses openvswitch to bridge an ibmveth interface for
  traffic from other LPARs.

  We see the following crash sometimes when running netperf:
  May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: 
[bnx2x_attn_int_deasserted3:4323(enP24p1s0f2)]MC assert!
  May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: 
[bnx2x_mc_assert:720(enP24p1s0f2)]XSTORM_ASSERT_LIST_INDEX 0x2
  May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: 
[bnx2x_mc_assert:736(enP24p1s0f2)]XSTORM_ASSERT_INDEX 0x0 = 0x00000000 
0x25e42a7e 0x00462a38 0x00010052
  May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: 
[bnx2x_mc_assert:750(enP24p1s0f2)]Chip Revision: everest3, FW Version: 7_13_1
  May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: 
[bnx2x_attn_int_deasserted3:4329(enP24p1s0f2)]driver assert
  May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: 
[bnx2x_panic_dump:923(enP24p1s0f2)]begin crash dump -----------------
  ... (dump of registers follows) ...

  Subsequent debugging reveals that the packets causing the issue come
  through the ibmveth interface - from the AIX LPAR. The veth protocol
  is 'special' - communication between LPARs on the same chassis can use
  very large (64k) frames to reduce overhead. Normal networks cannot
  handle such large packets, so traditionally, the VIOS partition would
  signal to the AIX partitions that it was 'special', and AIX would send
  regular, ethernet-sized packets to VIOS, which VIOS would then send
  out.

  This signalling between VIOS and AIX is done in a way that is not
  standards-compliant, and so was never made part of Linux. Instead, the
  Linux driver has always understood large frames and passed them up the
  network stack.

  In some cases (e.g. with TCP), multiple TCP segments are coalesced
  into one large packet. In Linux, this goes through the generic receive
  offload code, using a similar mechanism to GSO. These segments can be
  very large which presents as a very large MSS (maximum segment size)
  or gso_size.

  Normally, the large packet is simply passed to whatever network
  application on Linux is going to consume it, and everything is OK.

  However, in this case, the packets go through Open vSwitch, and are
  then passed to the bnx2x driver. The bnx2x driver/hardware supports
  TSO and GSO, but with a restriction: the maximum segment size is
  limited to around 9700 bytes. Normally this is more than adequate.
  However, if a large packet with very large (>9700 byte) TCP segments
  arrives through ibmveth, and is passed to bnx2x, the hardware will
  panic.

  [Impact]

  bnx2x card panics, requiring power cycle to restore functionality.

  The workaround is turning off TSO, which prevents the crash as the
  kernel resegments *all* packets in software, not just ones that are
  too big. This has a performance cost.

  [Fix]

  Test packet size in bnx2x feature check path and disable GSO if it is
  too large. To do this we move a function from one file to another and
  add another in the networking core.

  [Regression Potential]

  A/B/X: The changes to the network core are easily reviewed. The changes to 
behaviour are limited to the bnx2x card driver.
  The most likely failure case is a false-positive on the size check, which 
would lead to a performance regression only.

  T: This also involves a different change to the networking core to add
  the old-style GSO checking, which is more invasive. However the
  changes are simple and easily reviewed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1715519/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to