Please try the attached patch.  It makes it so that if the device does
not support ECRC generation or checking, we don't enable those features.
Currently (without this patch), we *do* enable ECRC generation and
checking if _HPX allows, i.e., if the platform can support ECRC.

The ACPI dump confirms my theory from comment #65 that the system
supplies an _HPX method with PCI_ERR_CAP_ECRC_GENE and
PCI_ERR_CAP_ECRC_CHKE set (see disassembly below).

Apparently the Intel Quick Assist card is at 85:00.0 ("Intel Corporation
DH895XCC Series QAT").  Here's the path leading to it:

  pci 0000:80:02.0: [8086:6f04]  # Xeon D PCI Express Root Port 2
  pci 0000:80:02.0: PCI bridge to [bus 83-86]
  pci 0000:83:00.0: [10b5:8724]  # PLX 8724 Upstream Port
  pci 0000:83:00.0: PCI bridge to [bus 84-86]
  pci 0000:84:00.0: [10b5:8724]  # PLX 8724 Downstream Port
  pci 0000:84:00.0: PCI bridge to [bus 85]
  pci 0000:85:00.0: [8086:0435]  # DH895XCC Series QAT

Here are the ECRC settings along the path:

  80:02.0: AERCap: GenCap+ CGenEn+ ChkCap+ ChkEn+
  83:00.0: AERCap: GenCap+ CGenEn+ ChkCap+ ChkEn+
  84:00.0: AERCap: GenCap+ CGenEn+ ChkCap+ ChkEn+
  85:00.0: AERCap: GenCap- CGenEn+ ChkCap- ChkEn+

This looks suspect because 85:00.0 claims that it does not support ECRC
Generation ("GenCap-") or ECRC Checking ("ChkCap-"), yet we set the
Enable bits for both features.  The workaround in the initial report
turns off ECRC checking in 80:02.0.  I suspect that turning off ECRC
generation and checking in 85:00.0, e.g., "setpci -s85:00.0 118.w=0"
would also be a workaround.  This patch should be the equivalent of this
setpci command.

Here's the _HPX disassembly from dsdt.dsl (extracted from comment #66):

        Device (PCI0)
            ...
            Method (_HPX, 0, NotSerialized)  // _HPX: Hot Plug Parameter 
Extensions
            {
                Store ("_HPX", Debug)
                Name (SSDH, Package (0x01)
                {
                    Package (0x12)
                    {
                        0x02, 
                        0x01, 
                        0xFC000FCF, // Uncorrectable Mask AND
                        0x03A18000, // Uncorrectable Mask OR
                        0xFC000FCF, // Uncorrectable Severity AND
                        0x004E7030, // Uncorrectable Severity OR
                        0xFFFF0E3E, // Correctable Mask AND
                        0xF1C1,     // Correctable Mask OR
                        0xFFFFFEBF, // AER AND
                        0x0140,     // AER OR
                        0xFFF1,     // Device Control AND
                        0x0E,       // Device Control OR
                        0xFFFF,     // Link Control AND
                        0x00,       // Link Control OR
                        0xFFFFC010, // Secondary Uncorrectable Severity AND
                        0x1BC0,     // Secondary Uncorrectable Severity OR
                        0xFFFFC010, // Secondary Uncorrectable Mask AND
                        0x242F      // Secondary Uncorrectable Mask OR
                    }
                })
                Store (SSDH, Debug)
                Return (SSDH)
            }


** Attachment added: "test patch to leave ECRC disabled when unsupported"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1571798/+attachment/4841199/+files/hpx-ecrc

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1571798

Title:
  Broadwell ECRC Support missing in Ubuntu

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Vivid:
  In Progress
Status in linux source package in Wily:
  In Progress
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Yakkety:
  In Progress

Bug description:
  Here is the problem statement from the Dell team:

  When booting into Ubuntu 14.04.4 with a Broadwell CPU and an Intel
  Quick Assist Card, the memory location that corresponds to ECRC is set
  to 0x01e0, when the BIOS is setting this location 0x00a0 pre-OS boot.
  This causes the card to not function unless we implement the following
  workaround using setpci.

  “setpci –s AA:BB.C 160.w=0”, where AA:BB.C is the PCI Root Path for
  the Intel Quick Assit Card.

  We’ve verified the memory location is correct when booting to other
  OSes, such as RHEL 7.2 and Windows Server 2012 R2.

  If there is any information you can give as to why this may be
  occurring in Ubuntu or where we may start to debug when the memory is
  changed, we would appreciate it.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1571798/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to