This bug was fixed in the package ipxe - 1.0.0+git-20180124.fbe8c52d-
0ubuntu4.1

---------------
ipxe (1.0.0+git-20180124.fbe8c52d-0ubuntu4.1) cosmic; urgency=medium

  * d/p/0005-strip-802.1Q-VLAN-0-priority-tags.patch: Strip 802.1Q VLAN 0
    priority tags; Fixes PXE when VLAN tag is 0. (LP: #1805920)

 -- Andres Rodriguez <andres...@ubuntu.com>  Mon, 10 Dec 2018 16:26:42
-0500

** Changed in: ipxe (Ubuntu Cosmic)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1805920

Title:
  iPXE ignores vlan 0 traffic

Status in MAAS:
  Invalid
Status in ipxe package in Ubuntu:
  Fix Released
Status in ipxe-qemu-256k-compat package in Ubuntu:
  Invalid
Status in linux package in Ubuntu:
  Invalid
Status in ipxe source package in Trusty:
  Won't Fix
Status in ipxe source package in Xenial:
  Won't Fix
Status in ipxe source package in Bionic:
  Fix Committed
Status in ipxe source package in Cosmic:
  Fix Released
Status in ipxe source package in Disco:
  Fix Released

Bug description:
  [Impact]

   * VLAN 0 is special (for QoS actually, not a real VLAN)
   * Some components in the stack accidentally strip it, so does ipxe in
     this case.
   * Fix by porting a fix that is carried by other distributions as upstream
     didn't follow the suggestion but it is needed for the use case affected
     by the bug here (Thanks Andres)

  [Test Case]

   * Comment #42 contains a virtual test setup to understand the case but it 
     does NOT trigger the isse. That requires special switch HW that adds 
     VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a 
     customer site with such hardware being affected by this issue.

  [Regression Potential]

   * The only reference to VLAN tags on iPXE boot that we found was on iBFT
     boot for SCSI, we tested that in comment #34 and it still worked fine.
   * We didn't see such cases on review, but there might be use cases that
     made some unexpected use of the headers which are now stripped. But
     that seems wrong.

  [Other Info]

   * n/a

  ---

  I have three MAAS rack/region nodes which are blades in a Cisco UCS
  chassis. This is an FCE deployment where MAAS has two DHCP servers,
  infra1 is the primary and infra3 is the secondary. The pod VMs on
  infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE
  boot. If I reconfigure the subnet to provide DHCP on infra2 (either as
  primary or secondary) then the pod VMs on infra2 will PXE boot but the
  pod VMs on the demoted infra node (that no longer serves DHCP) now
  fail to PXE boot.

  While commissioning a pod VM on infra2 I captured network traffic with
  tcpdump on the vnet interface.

  Here is the dump when the PXE boot fails (no dhcp server on infra2):
  https://pastebin.canonical.com/p/THW2gTSv4S/

  Here is the dump when PXE boot succeeds (when infra2 is serving dhcp):
  https://pastebin.canonical.com/p/HH3XvZtTGG/

  The only difference I can see is that in the unsuccessful scenario,
  the reply is an 802.1q packet -- it's got a vlan tag for vlan 0.
  Normally vlan 0 traffic is passed as if it is not tagged and indeed, I
  can ping between the blades with no problem. Outgoing packets are
  untagged but incoming packets are tagged vlan 0 -- but the ping works.
  It seems vlan 0 is used as a part of 802.1p to set priority of
  packets. This is separate from vlan, it just happens to use that
  ethertype to do the priority tagging.

  Someone confirmed to me that, in the iPXE source, it drops all packets
  if they are vlan tagged.

  The customer is unable to figure out why the packets between blades is
  getting vlan tagged so we either need to figure out how to allow iPXE
  to accept vlan 0 or the customer will need to use different equipment
  for the MAAS nodes.

  I found a conversation on the ipxe-devel mailing list that suggested a
  commit was submitted and signed off but that was from 2016 so I'm not
  sure what became of it. Notable messages in the thread:

  http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html
  http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html

  Would it be possible to install a local patch as part of the FCE
  deployment? I suspect the patch(es) mentioned in the above thread
  would require some modification to apply properly.

To manage notifications about this bug go to:
https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to