On Wed, Jan 15, 2020 at 11:28 PM Juerg Haefliger
<juerg.haefli...@canonical.com> wrote:
>
> On Thu, 16 Jan 2020 02:14:16 -0000
> dann frazier <dann.fraz...@canonical.com> wrote:
>
> > I built a kernel with the proposed patches[*] and ran a reboot/kernel
> > compile test on 4 systems. The tests survived 46 total iterations
> > (~12/system) before I interrupted. Two systems failed with "Synchronous
> > External Abort: synchronous parity or ECC error" errors.
> >
> > I've reverted the systems back to 4.15.0-70 - the kernel before the
> > cpufeature/errata patches that caused this - to see if these SEA errors
> > are a regression.
> >
> > [*] https://lists.ubuntu.com/archives/kernel-
> > team/2020-January/106909.html
> >
>
> I've ran 75 iterations of reboot/compile-kernel and encountered 3 gcc
> segmentation faults. Unfortunately, my test didn't capture the dmesg log but
> it's likely that these are due to the ECC problems we're (still?) seeing.

I've seen those on every machine so far when ran long enough. Since I
believe we've clearly demonstrated that this is an unrelated failure,
I've split it out into bug 1860013 - let's track it there.

> There was also another issue during one of the reboots which is probably
> unrelated and due to a flaky BMC:

Let's track that in bug 1857073. Even if it is a flaky BMC, the IPMI
driver should handle the failure gracefully.
Did you see this on host 'wright' as well?

 -dann

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1857074

Title:
  Cavium ThunderX CN88XX Panic : Unknown reason

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  Confirmed

Bug description:
  Series: Bionic
  Kernel: 4.15.0-74.84 linux-generic
  Steps to reproduce:  Install 4.15.0-74.84 Kernel and boot the system.

  The following crash was observed while testing the proposed kernel for the 
2019.12.02 SRU Cycle.
  This kernel was built to include fixes for the following bugs:

    * [Regression] Bionic kernel 4.15.0-71.80 can not boot on ThunderX
      (LP: #1853326)
      - Revert "arm64: Use firmware to detect CPUs that are not affected by
        Spectre-v2"
      - Revert "arm64: Get rid of __smccc_workaround_1_hvc_*"

    * [Regression] Bionic kernel 4.15.0-71.80 can not boot on ThunderX2 and
      Kunpeng920 (LP: #1852723)
      - SAUCE: arm64: capabilities: Move setup_boot_cpu_capabilities() call to
        correct place

  The following crash appears to be a NEW bug. not related to the prior bugs 
listed above.
  This bug DOES NOT APPEAR to be related to LP#1857073.

  This is another NEW BUG.

  Hostname: Starmie

  Probable Cause is unknown at this point and still under investigation.

  [  OK  ] Found device WDC_WD5003ABYZ-011FA0 efi.
           Mounting /boot/efi...
  [  OK  ] Mounted /boot/efi.
  [  OK  ] Reached target Local File Systems.
           Starting AppArmor initialization...
           Starting Tell Plymouth To Write Out Runtime Data...
           Starting ebtables ruleset management...
  [   20.942427] kernel BUG at 
/build/linux-pWET3k/linux-4.15.0/fs/buffer.c:1240!
  [   20.951416] Internal error: Oops - BUG: 0 [#1] SMP
  [   20.958153] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip 
cavium_rng_vf shpchp cavium_rng gpio_keys uio_pdrv_genirq ipmi_ssif uio 
ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf 
nicpf ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect aes_ce_blk 
sysimgblt fb_sys_fops aes_ce_cipher crc32_ce drm crct10dif_ce ghash_ce sha2_ce 
sha256_arm64 sha1_ce ahci thunder_bgx libahci thunder_xcv i2c_thunderx 
mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd 
cryptd aes_arm64
  [   21.044326] Process systemd (pid: 1, stack limit = 0x000000005af6f18b)
  [   21.053858] CPU: 1 PID: 1 Comm: systemd Not tainted 4.15.0-74-generic 
#84-Ubuntu
  [   21.063931] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., 
BIOS 5.11 12/12/2012
  [   21.074790] pstate: 20400085 (nzCv daIf +PAN -UAO)
  [   21.082096] pc : __find_get_block+0x2e8/0x398
  [   21.088917] lr : __getblk_gfp+0x3c/0x2a8
  [   21.095379] sp : ffff0000099ab7e0
  [   21.101062] x29: ffff0000099ab7e0 x28: 0000000000000000
  [   21.108699] x27: 0000000000000000 x26: 0000000000000000
  [   21.116265] x25: 0000000000000001 x24: 0000000000000000
  [   21.123788] x23: 0000000000000008 x22: ffff801f26116c80
  [   21.131302] x21: ffff801f26116c80 x20: 000000000000245c
  [   21.138808] x19: 0000000000001000 x18: 0000ffffa59c3a70
  [   21.146300] x17: 0000000000000000 x16: 0000000000000000
  [   21.153730] x15: 0000000000000020 x14: 0000000000000012
  [   21.161083] x13: 2f7374696e752f64 x12: 0101010101010101
  [   21.168397] x11: 7f7f7f7f7f7f7f7f x10: ffff00000972d000
  [   21.175689] x9 : 0000000000000000 x8 : ffff801f7ba7e3c0
  [   21.183042] x7 : ffff801f7ba7e3e0 x6 : 0000000000000000
  [   21.190667] x5 : 0000000000000004 x4 : 0000000000000020
  [   21.197955] x3 : 0000000000000008 x2 : 0000000000001000
  [   21.205680] x1 : 000000000000245c x0 : 0000000000000080
  [   21.212918] Call trace:
  [   21.217257]  __find_get_block+0x2e8/0x398
  [   21.223160]  __getblk_gfp+0x3c/0x2a8
  [   21.228644]  ext4_getblk+0xcc/0x1b0
  [   21.233991]  ext4_bread_batch+0x78/0x1c8
  [   21.239726]  ext4_find_entry+0x2d4/0x598
  [   21.245416]  ext4_lookup+0xac/0x278
  [   21.250612]  lookup_slow+0xac/0x190
  [   21.255736]  walk_component+0x228/0x340
  [   21.261151]  link_path_walk+0x2f4/0x568
  [   21.266499]  path_parentat+0x44/0x88
  [   21.271521]  filename_parentat+0xa0/0x170
  [   21.276924]  filename_create+0x60/0x168
  [   21.282082]  SyS_symlinkat+0x80/0x128
  [   21.287013]  el0_svc_naked+0x30/0x34
  [   21.291835] Code: 17ffffe7 a90363b7 a9046bb9 f9002bbb (d4210000)
  [   21.299191] ---[ end trace b07cecc329f07f48 ]---
  [   21.347488] systemd: 35 output lines suppressed due to ratelimiting
  [   21.355094] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x0000000b
  [   21.355094]
  [   21.366666] SMP: stopping secondary CPUs
  [   21.371817] Kernel Offset: disabled
  [   21.376517] CPU features: 0x00901108
  [   21.381310] Memory Limit: none
  [   21.385617] ---[ end Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x0000000b
  [   21.385617]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857074/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to