** Description changed:

  [Impact]
  In configs Zesty and Artful, EDAC_MM_EDAC is set to =m, this disables 
EDAC_GHES. Customers using RAS on ARM64 may want this functionality.
  
  According to RAS expert at QTI. EDAC_GHES is essential for ARMv8.0
  Server systems, as it enables firmware-first error handling of memory
  and CPU errors. Due to a lack of standard RAS architecture (or machine
  check architecture equivalent) on ARMv8.0 systems, APEI/GHES is the only
  mechanism available for reporting hardware errors (e.g. memory and CPU
  errors). This enables reporting of hardware errors, and also helps
  enable memory fault recovery mechanisms to extend the life of the system
  by offlining pages when recoverable uncorrected errors are encountered.
  Note that other ARM vendors will be going in this direction for hardware
  error handling.
  
  [Test]
+ Test kernel available in 
https://launchpad.net/~centriq-team/+archive/ubuntu/lp1706141
+ 
  Boot the kernel and check dmesg for the following:
  $ dmesg | grep -i -E "edac|hest|ghes"
  [    0.000000] ACPI: HEST 0x0000000009160000 000288 (v01 QCOM   QDF2400  
00000001 INTL 20150515)
  [    0.620278] HEST: Table parsing has been initialized.
  [    4.178298] EDAC MC: Ver: 3.0.0
  [    5.664499] ghes_edac: This EDAC driver relies on BIOS to enumerate memory 
and get error reports.
  [    5.673371] ghes_edac: Unfortunately, not all BIOSes reflect the memory 
layout correctly.
  [    5.681542] ghes_edac: So, the end result of using this driver varies from 
vendor to vendor.
  [    5.689972] ghes_edac: If you find incorrect reports, please contact your 
hardware vendor
  [    5.698142] ghes_edac: to correct its BIOS.
  [    5.702320] ghes_edac: This system has 12 DIMM sockets.
  [    5.707717] EDAC MC0: Giving out device to module ghes_edac.c controller 
ghes_edac: DEV ghes (INTERRUPT)
  [    5.717264] EDAC MC1: Giving out device to module ghes_edac.c controller 
ghes_edac: DEV ghes (INTERRUPT)
  [    5.726806] EDAC MC2: Giving out device to module ghes_edac.c controller 
ghes_edac: DEV ghes (INTERRUPT)
  [    5.736344] EDAC MC3: Giving out device to module ghes_edac.c controller 
ghes_edac: DEV ghes (INTERRUPT)
  [    5.745883] EDAC MC4: Giving out device to module ghes_edac.c controller 
ghes_edac: DEV ghes (INTERRUPT)
  [    5.755469] GHES: APEI firmware first mode is enabled by APEI bit and WHEA 
_OSC.
  
  [Fix]
  1. Apply RAS patch series submitted for SRU in Bug #1696570
  2. Set config option EDAC_MM_EDAC=y for ARM64, this will automatically set 
EDAC_GHES=y
- 3. Remove edac_core from 
+ 3. Remove edac_core from
  debian.master/abi/<ver>/arm64/generic.modules
  
  [Regression Potential]
  The config change is limited to ARM64 architecture, and does not impact any 
other architecture. Potential for regressions is low.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1706141

Title:
  [ARM64] config EDAC_GHES=y depends on EDAC_MM_EDAC=y

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  [Impact]
  In configs Zesty and Artful, EDAC_MM_EDAC is set to =m, this disables 
EDAC_GHES. Customers using RAS on ARM64 may want this functionality.

  According to RAS expert at QTI. EDAC_GHES is essential for ARMv8.0
  Server systems, as it enables firmware-first error handling of memory
  and CPU errors. Due to a lack of standard RAS architecture (or machine
  check architecture equivalent) on ARMv8.0 systems, APEI/GHES is the
  only mechanism available for reporting hardware errors (e.g. memory
  and CPU errors). This enables reporting of hardware errors, and also
  helps enable memory fault recovery mechanisms to extend the life of
  the system by offlining pages when recoverable uncorrected errors are
  encountered. Note that other ARM vendors will be going in this
  direction for hardware error handling.

  [Test]
  Test kernel available in 
https://launchpad.net/~centriq-team/+archive/ubuntu/lp1706141

  Boot the kernel and check dmesg for the following:
  $ dmesg | grep -i -E "edac|hest|ghes"
  [    0.000000] ACPI: HEST 0x0000000009160000 000288 (v01 QCOM   QDF2400  
00000001 INTL 20150515)
  [    0.620278] HEST: Table parsing has been initialized.
  [    4.178298] EDAC MC: Ver: 3.0.0
  [    5.664499] ghes_edac: This EDAC driver relies on BIOS to enumerate memory 
and get error reports.
  [    5.673371] ghes_edac: Unfortunately, not all BIOSes reflect the memory 
layout correctly.
  [    5.681542] ghes_edac: So, the end result of using this driver varies from 
vendor to vendor.
  [    5.689972] ghes_edac: If you find incorrect reports, please contact your 
hardware vendor
  [    5.698142] ghes_edac: to correct its BIOS.
  [    5.702320] ghes_edac: This system has 12 DIMM sockets.
  [    5.707717] EDAC MC0: Giving out device to module ghes_edac.c controller 
ghes_edac: DEV ghes (INTERRUPT)
  [    5.717264] EDAC MC1: Giving out device to module ghes_edac.c controller 
ghes_edac: DEV ghes (INTERRUPT)
  [    5.726806] EDAC MC2: Giving out device to module ghes_edac.c controller 
ghes_edac: DEV ghes (INTERRUPT)
  [    5.736344] EDAC MC3: Giving out device to module ghes_edac.c controller 
ghes_edac: DEV ghes (INTERRUPT)
  [    5.745883] EDAC MC4: Giving out device to module ghes_edac.c controller 
ghes_edac: DEV ghes (INTERRUPT)
  [    5.755469] GHES: APEI firmware first mode is enabled by APEI bit and WHEA 
_OSC.

  [Fix]
  1. Apply RAS patch series submitted for SRU in Bug #1696570
  2. Set config option EDAC_MM_EDAC=y for ARM64, this will automatically set 
EDAC_GHES=y
  3. Remove edac_core from
  debian.master/abi/<ver>/arm64/generic.modules

  [Regression Potential]
  The config change is limited to ARM64 architecture, and does not impact any 
other architecture. Potential for regressions is low.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1706141/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to