Public bug reported:

[Impact]
Under load, ThunderX systems eventually fail with:

[  282.360376] Synchronous External Abort: synchronous parity or ECC error 
(0x96000018) at 0x0000ffffa6eb7000
[  282.372351] Internal error: : 96000018 [#1] SMP
[  282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip 
shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif 
ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf 
nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea 
sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce 
crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx 
thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs 
aes_neon_blk crypto_simd cryptd aes_arm64
[  282.467284] Process cc1 (pid: 39700, stack limit = 0x00000000e0c44146)
[  282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic 
#85+lp1857074.1
[  282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 
5.11 12/12/2012
[  282.500121] pstate: 80000005 (Nzcv daif -PAN -UAO)
[  282.508297] pc : __arch_copy_to_user+0x13c/0x248
[  282.516430] lr : cp_new_stat+0x140/0x178
[  282.523768] sp : ffff00002e4d3d40
[  282.530369] x29: ffff00002e4d3d40 x28: ffff801f51fa2d00 
[  282.538988] x27: ffff000008b52000 x26: 0000000000000050 
[  282.548031] x25: 0000000000000124 x24: 0000000000000015 
[  282.556872] x23: 0000000000000000 x22: 000000002e4d3d88 
[  282.565449] x21: ffff801f51fa2d00 x20: ffff000009588000 
[  282.574109] x19: ffff00002e4d3e30 x18: 0000ffffa87e7a70 
[  282.582790] x17: 0000ffffa8756110 x16: ffff0000082f4448 
[  282.591433] x15: 0000000000000000 x14: 0000000000000012 
[  282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 
[  282.608730] x11: 0000000000000000 x10: 0000000000000cf0 
[  282.617283] x9 : 0000000000001000 x8 : 00000001000081a4 
[  282.625839] x7 : 0000000001001a2b x6 : 000000002e4d3da0 
[  282.634238] x5 : 000000002e4d3e08 x4 : 0000000000000008 
[  282.642754] x3 : 0000000000000802 x2 : fffffffffffffff8 
[  282.651250] x1 : ffff00002e4d3d90 x0 : 000000002e4d3d88 
[  282.660013] Call trace:
[  282.665421]  __arch_copy_to_user+0x13c/0x248
[  282.672979]  SyS_newfstat+0x58/0x88
[  282.679272]  el0_svc_naked+0x30/0x34
[  282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) 
[  282.694411] ---[ end trace 863693cf0c3fd297 ]---

[Test Case]
We found this by doing a reboot/kernel build loop. (The reboot maybe 
unnecessary). Code to automate this setup is at:
  https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop

[Fix]
[Regression Risk]

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Triaged

** Affects: linux (Ubuntu Bionic)
     Importance: Undecided
         Status: Confirmed

** Affects: linux (Ubuntu Disco)
     Importance: Undecided
         Status: Triaged

** Affects: linux (Ubuntu Eoan)
     Importance: Undecided
         Status: Triaged

** Affects: linux (Ubuntu Focal)
     Importance: Undecided
         Status: Triaged

** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Focal)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Eoan)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Disco)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu Bionic)
       Status: New => Confirmed

** Changed in: linux (Ubuntu Disco)
       Status: New => Triaged

** Changed in: linux (Ubuntu Eoan)
       Status: New => Triaged

** Changed in: linux (Ubuntu Focal)
       Status: New => Triaged

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1860013

Title:
  [thunderx] Synchronous External Abort: synchronous parity or ECC error

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Triaged
Status in linux source package in Eoan:
  Triaged
Status in linux source package in Focal:
  Triaged

Bug description:
  [Impact]
  Under load, ThunderX systems eventually fail with:

  [  282.360376] Synchronous External Abort: synchronous parity or ECC error 
(0x96000018) at 0x0000ffffa6eb7000
  [  282.372351] Internal error: : 96000018 [#1] SMP
  [  282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip 
shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif 
ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf 
nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea 
sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce 
crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx 
thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs 
aes_neon_blk crypto_simd cryptd aes_arm64
  [  282.467284] Process cc1 (pid: 39700, stack limit = 0x00000000e0c44146)
  [  282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic 
#85+lp1857074.1
  [  282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., 
BIOS 5.11 12/12/2012
  [  282.500121] pstate: 80000005 (Nzcv daif -PAN -UAO)
  [  282.508297] pc : __arch_copy_to_user+0x13c/0x248
  [  282.516430] lr : cp_new_stat+0x140/0x178
  [  282.523768] sp : ffff00002e4d3d40
  [  282.530369] x29: ffff00002e4d3d40 x28: ffff801f51fa2d00 
  [  282.538988] x27: ffff000008b52000 x26: 0000000000000050 
  [  282.548031] x25: 0000000000000124 x24: 0000000000000015 
  [  282.556872] x23: 0000000000000000 x22: 000000002e4d3d88 
  [  282.565449] x21: ffff801f51fa2d00 x20: ffff000009588000 
  [  282.574109] x19: ffff00002e4d3e30 x18: 0000ffffa87e7a70 
  [  282.582790] x17: 0000ffffa8756110 x16: ffff0000082f4448 
  [  282.591433] x15: 0000000000000000 x14: 0000000000000012 
  [  282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 
  [  282.608730] x11: 0000000000000000 x10: 0000000000000cf0 
  [  282.617283] x9 : 0000000000001000 x8 : 00000001000081a4 
  [  282.625839] x7 : 0000000001001a2b x6 : 000000002e4d3da0 
  [  282.634238] x5 : 000000002e4d3e08 x4 : 0000000000000008 
  [  282.642754] x3 : 0000000000000802 x2 : fffffffffffffff8 
  [  282.651250] x1 : ffff00002e4d3d90 x0 : 000000002e4d3d88 
  [  282.660013] Call trace:
  [  282.665421]  __arch_copy_to_user+0x13c/0x248
  [  282.672979]  SyS_newfstat+0x58/0x88
  [  282.679272]  el0_svc_naked+0x30/0x34
  [  282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) 
  [  282.694411] ---[ end trace 863693cf0c3fd297 ]---

  [Test Case]
  We found this by doing a reboot/kernel build loop. (The reboot maybe 
unnecessary). Code to automate this setup is at:
    https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop

  [Fix]
  [Regression Risk]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to