------- Comment From [email protected] 2017-07-04 09:03 EDT------- This CMVC defect is being cancelled by the CDE Bridge because the corresponding CQ Defect [SW354783] was transferred out of the bridge domain. Here are the additional details: New Subsystem = ppc_triage New Release = unspecified New Component = ubuntu_linux New OwnerInfo = Chavez, Luciano ([email protected]) To continue tracking this issue, please follow CQ defect [SW354783].
Opened defect SW355478 on new fail to see if it is the same issue. I made sev 1 since system in XMON right now and is preventing further testing. Like I mentioned earlier, the fail could be related to this defect. For this defect... The "Oops: Kernel access of bad area, sig: 11 [#1]" in the logs happens during HTX run. On the reboot (that happened ~30 minutes after first error), I saw partition hang/crash. I had to use ipmitool to power down system. Current xmon crash in SW355478 / 142348 is different than one being tracked in this bug. Will wait for recreate of original issue. The FlashGT HST team still needs to recreate this issue. SW357236 "HTX fail during superpipe 128 per LUN testing...during Guardband Testing" is now marked as a duplicate of this SW354783. Per comment from JVP (SW357236 submitter), he is attempting a recreate again with the latest Firmware for his Tuleta-L. We will monitor that attempt at recreate, and reopen this SW354783 if a new recreate is achieved. This original recreate attempt on Firestone, fsbmc30, may be delayed, as it is currently tied up with debugging a link training issue. <Automated Update> The severity of defect SW354783 was increased from 2 to 1 because defect SW358210 was rejected as the duplicate of defect SW354783 and the severity of defect SW358210 was higher than 2 Defect submitter, Dion is out on vacation until 7/11. So we can make progress on this most recent recreate, SW358210 dup'd to this SW354783, I request the defect Owner, Luciano/ScreenTeam, to please reopen this SW354783 and continue live debug on the held system from SW358210: #=#=# 2016-07-05 17:12:28 (CDT) #=#=# Action = [reopen] I'm not quite sure how to handle this (I'll ping Mark Smith) defect. Dion's defect SW358210 : FlashGT STC GA3: capiredp01: TMF timed out and Unable to handle kernel paging request before system drops into xmon debugger, was running HTX for superpipe with 1600 virtual luns across 4 FlashGT NVME cards was just dup'd to this one. That system is currently in XMON debugger now and can be debugged to 1) verify it is same issue and 2) maybe try to find root cause (his defect can be re-opened if not the same issue). #=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=# Not able to look SW358210. Looking into machine capiredp01 box. Machine details: FSP: capiredfsp.aus.stglabs.ibm.com (dev/FipSdev) Partition: capiredp01.aus.stglabs.ibm.com IPMI console: ipmitool -I lanplus -H capiredfsp.aus.stglabs.ibm.com -P abc123 sol activate Fail on "capiredfsp" seems same as reported in this bug. hxesurelock process has segfaulted and kernel has crashed while generating core dump. cde00 ([email protected]) added native attachment /tmp/AIXOS05866176/dmesg_backtrace_capiredfsp on 2016-07-07 06:19:39 Hi Dominic, Can you please have some one from kernel team look at this ? HTX (hxesurelock) process has segfaulted and kernel has crashed while generating core. Attached kernel logs with bug . Machine is sitting in xmon and available for debug. (In reply to comment #25) > Hi Dominic, > Can you please have some one from kernel team look at this ? > HTX (hxesurelock) process has segfaulted and kernel has crashed while > generating core. Attached kernel logs with bug . Machine is sitting in > xmon and available for debug. Vipin, I cannot ssh to capiredfsp.aus.stglabs.ibm.com (dev/FipSdev). Is the machine still in xmon? (In reply to comment #26) > Vipin, > I cannot ssh to capiredfsp.aus.stglabs.ibm.com (dev/FipSdev). Is the machine > still in xmon? Yes its still sitting in xmon. You can open console via IPMI. Please see comment 22 for machine access details. Just wanted to point out the send_tmf timeout (at the end of the kernel log) before the crash even though I am not sure it is the cause. The system is in xmon. Please advise if additional debug data need to be collected. Thanks. Snippet at the end of the kernel log: [ 8801.190528] cxlflash 0007:00:00.0: send_tmf: TMF timed out! [ 8806.190383] cxlflash 0007:00:00.0: send_tmf: TMF timed out! [ 8816.507485] hxesurelock[14180]: unhandled signal 11 at 0000000000000024 nip 00003fff852c2ee8 lr 00003fff852c2938 code 30001 [ 8816.511368] hxesurelock[13501]: unhandled signal 11 at 0000000000000024 nip 00003fff890b2ee8 lr 00003fff890b2938 code 30001 [ 8816.526807] Unable to handle kernel paging request for data at address 0x0000000c [ 8816.526928] Faulting instruction address: 0xc00000000035e2b0 [ 8816.530233] Unable to handle kernel paging request for data at address 0x0000000c [ 8816.530596] Faulting instruction address: 0xc00000000035e2b0 Snippet of the send_tmf() code: 453 cmd_checkin(cmd); 454 spin_lock_irqsave(&cfg->tmf_slock, lock_flags); 455 cfg->tmf_active = false; 456 spin_unlock_irqrestore(&cfg->tmf_slock, lock_flags); 457 goto out; 458 } 459 460 spin_lock_irqsave(&cfg->tmf_slock, lock_flags); 461 to = msecs_to_jiffies(5000); 462 to = wait_event_interruptible_lock_irq_timeout(cfg->tmf_waitq, 463 !cfg->tmf_active, 464 cfg->tmf_slock, 465 to); 466 if (!to) { 467 cfg->tmf_active = false; 468 dev_err(dev, "%s: TMF timed out!\n", __func__); 469 rc = -1; 470 } 471 spin_unlock_irqrestore(&cfg->tmf_slock, lock_flags); Boqun, -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1667239 Title: FlashGT Integration and Setup: fsbmc30: After 17th reboot of soft bootme, HTX & Linux errors seen with 256 virtual LUNs Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Fix Released Status in linux source package in Yakkety: Fix Released Bug description: == Comment: #1 - Application Cdeadmin <[email protected]> - 2016-06-02 15:28:27 == ==== State: Open by: anitrap on 01 June 2016 17:36:39 ==== Contact: Anitra Powell ([email protected] ) Backup: Dion Bell ([email protected]) Primary BMC (1603G): ===================================================== # cat /proc/ractrends/Helper/FwInfo FW_VERSION=2.13.91819 FW_DATE=Mar 10 2016 FW_BUILDTIME=10:59:31 CDT FW_DESC=8335 SRC BUILD RR9 03102016 FW_PRODUCTID=1 FW_RELEASEID=RR9 FW_CODEBASEVERSION=2.X # PNOR (1603G): ======================== # ipmitool -H 127.0.0.1 -I lanplus -U ADMIN -P admin fru list 47 Product Name : OpenPOWER Firmware Product Version : IBM-firestone-ibm-OP8_v1.7_1.62 Product Extra : hostboot-bc98d0b-1a29dff Product Extra : occ-0362706-16fdfa7 Product Extra : skiboot-5.1.13 Product Extra : hostboot-binaries-43d5a59 Product Extra : firestone-xml-e7b4fa2-c302f0e Product Extra : capp-ucode-105cb8f Partition Info: ================= ver 1.5.4.3 - OS, HTX, Firmware and Machine details OS: GNU/Linux OS Version: Ubuntu 16.04 LTS \n \l Kernel Version: 4.4.8c0ffee0+ HTX Version: htxubuntu-396 Host Name: fsbmc30p1 Machine Serial No: 210995A Machine Type/Model: 8335-GCA root@fsbmc30p1:~# uname -a Linux fsbmc30p1 4.4.8c0ffee0+ #2 SMP Tue May 24 10:50:26 CDT 2016 ppc64le ppc64le ppc64le GNU/Linux FlashGT NVMe setup: =================== 1 FlashGT card in slot 1 running in superpipe mode with 128 LUNs per port (total of 256 LUNs). lsscsi [0:0:0:0] disk ATA ST1000NX0313 BE33 /dev/sda [1:0:0:0] disk ATA ST1000NX0313 BE33 /dev/sdb [4:0:0:0] disk NVMe SAMSUNG MZ1LV960 3011 /dev/sdc [4:1:0:0] disk NVMe SAMSUNG MZ1LV960 3011 /dev/sdd [5:0:0:0] cd/dvd AMI Virtual CDROM0 1.00 /dev/sr0 [5:0:0:1] cd/dvd AMI Virtual CDROM1 1.00 /dev/sr1 [5:0:0:2] cd/dvd AMI Virtual CDROM2 1.00 /dev/sr2 [5:0:0:3] cd/dvd AMI Virtual CDROM3 1.00 /dev/sr3 [6:0:0:0] disk AMI Virtual Floppy0 1.00 /dev/sde [6:0:0:1] disk AMI Virtual Floppy1 1.00 /dev/sdf [6:0:0:2] disk AMI Virtual Floppy2 1.00 /dev/sdg [6:0:0:3] disk AMI Virtual Floppy3 1.00 /dev/sdh [7:0:0:0] disk AMI Virtual HDisk0 1.00 /dev/sdi [7:0:0:1] disk AMI Virtual HDisk1 1.00 /dev/sdj [7:0:0:2] disk AMI Virtual HDisk2 1.00 /dev/sdk [7:0:0:3] disk AMI Virtual HDisk3 1.00 /dev/sdl [7:0:0:4] disk AMI Virtual HDisk4 1.00 /dev/sdm lspci | grep -i acc 0004:01:00.0 Processing accelerators: IBM Device 0601 (rev 01) ls -l /sys/class/cxl total 0 lrwxrwxrwx 1 root root 0 May 31 13:27 afu0.0 -> ../../devices/pci0004:00/0004:00:00.0/0004:01:00.0/cxl/card0/afu0.0 lrwxrwxrwx 1 root root 0 May 31 13:27 afu0.0m -> ../../devices/pci0004:00/0004:00:00.0/0004:01:00.0/cxl/card0/afu0.0/afu0.0m lrwxrwxrwx 1 root root 0 May 31 13:27 afu0.0s -> ../../devices/pci0004:00/0004:00:00.0/0004:01:00.0/cxl/card0/afu0.0/afu0.0s lrwxrwxrwx 1 root root 0 May 31 13:27 card0 -> ../../devices/pci0004:00/0004:00:00.0/0004:01:00.0/cxl/card0 lscfg | grep afu + afu0.0 Slot1/card0/afu0.0 + afu0.0m Slot1/card0/afu0.0/afu0.0m + afu0.0s Slot1/card0/afu0.0/afu0.0s /opt/ibm/capikv/bin/cxlfstatus CXL Flash Device Status Found 0601 0004:01:00.0 Slot1 Device: SCSI Block Mode LUN WWID sg2: 4:0:0:0, sdc, superpipe, 60025380025382463300046000000000 sg3: 4:1:0:0, sdd, superpipe, 60025380025382463300052000000000 dpkg -l | grep capi 4el no description given 3.0-1970-3042652 ppc6 4el no description given 3.0-1970-3042652 ppc6 root@fsbmc30p1:/tmp# dpkg -l | grep afu ii afuimage 3.0-1970-3042652 all no description given cat /opt/ibm/capikv/version.txt 1970-3042652 /opt/ibm/capikv/afu/cxl_afu_dump /dev/cxl/afu0.0m -v AFU Version = 160525N1 NVMe0 Version = BTV73011 NVMe0 NEXT = BTV73011 NVMe0 STATUS = 0x702 NVMe1 Version = BTV73011 NVMe1 NEXT = BTV73011 NVMe1 STATUS = 0x702 cat /tmp/test_lun_mode 128 Problem: =========== While running soft bootme (shutdown -r from OS every hour, I noticed htx errors after the 9th & 17th reboot of partition. At this point they seem like different issues so I am opening up 2 different defects. I've already opened up defect SW354759 for the first set of htx errors and assigned to htx_screen. This defect is for issue that happened after 17th reboot (Jun 1 @ 6am). On the 18th reboot (Jun 1 @ 7am), the shutdown -r command failed... I had to manually power down system. I guess I will open to surelock_screen first since it seems similar to the one Dion opened up while running 128 virtual LUNs per port (defect http://w3.rchland.ibm.com/projects/bestquest/?defect=SW353881) . For this fail, other exercisers eventually failed also. Test Info: ============ - running Soft bootme (shutdown -r every hour) - mdt.bu + hxecom (GPUs were running). I copied a modified mdt.bu to another mdt file so I would not see any errors in htx after reboot. Sample of HTX errors (for this defect) ============================== /dev/sg2.53 Jun 1 06:26:53 2016 err=00000010 sev=4 hxesurelock READCMP5 numopers= 20000 loop= 4956 blk=0x4eee len= 4096 offset=0 Seed Values= 37882, 44181, 50758 Data Pattern Seed Values = 37882, 44182, 50758 LBA Fencepost = 0xb94a cblk_read error - Device or resource busy /dev/sg2.18 Jun 1 06:26:53 2016 err=00000010 sev=4 hxesurelock READCMP9 numopers= 20000 loop= 1501 blk=0x93f1 len= 4096 offset=0 Seed Values= 37847, 44740, 50780 Data Pattern Seed Values = 37847, 44741, 50780 LBA Fencepost = 0xb275 cblk_read error - Device or resource busy /dev/sg2.98 Jun 1 06:26:53 2016 err=00000010 sev=4 hxesurelock READCMP5 numopers= 20000 loop= 10365 blk=0x86d5 len= 4096 offset=0 Seed Values= 37927, 41320, 50710 Data Pattern Seed Values = 37927, 41321, 50710 LBA Fencepost = 0xbc7c cblk_read error - Device or resource busy /dev/sg2.116 Jun 1 06:30:45 2016 err=00000005 sev=4 hxesurelock RDCMP10 numopers= 20000 loop= 6383 blk=0xc33d len= 4096 offset=0 Seed Values= 37945, 49039, 50726 Data Pattern Seed Values = 37945, 49040, 50726 LBA Fencepost = 0xd0b0 cblk_read error - Input/output error /dev/fpu17 Jun 1 06:30:51 2016 err=0000000b sev=1 hxefpu64 pthread_create call failed with rc: 11, errno: 11, Resource temporarily unavailable /dev/fpu17 Jun 1 06:30:51 2016 err=0000000b sev=1 hxefpu64 Hardware Exerciser stopped on an error /dev/sctu43 Jun 1 06:30:51 2016 err=0000000b sev=1 hxesctu pthread_create call failed with rc: 11, errno: 11, Resource temporarily unavailable /dev/sctu43 Jun 1 06:30:51 2016 err=0000000b sev=1 hxesctu Hardware Exerciser stopped on an error Logs: ====== /gsa/ausgsa/home/a/n/anitrap/web/public/fsbmc30/softbootme_fail_1 /gsa/ausgsa/home/a/n/anitrap/web/public/fsbmc30/softbootme_fail_1/htxerr /gsa/ausgsa/home/a/n/anitrap/web/public/fsbmc30/softbootme_fail_1/syslog /gsa/ausgsa/home/a/n/anitrap/web/public/fsbmc30/softbootme_fail_1/kern.log /gsa/ausgsa/home/a/n/anitrap/web/public/fsbmc30/softbootme_fail_1/bootme.log sample of syslog during first htx error: ================================================ Jun 1 06:19:20 fsbmc30p1 systemd[1]: Started Cleanup of Temporary Directories. Jun 1 06:25:01 fsbmc30p1 rsyslogd-2007: action 'action 10' suspended, next retry is Wed Jun 1 06:25:31 2016 [v8.16.0 try http://www.rsyslog.com/e/2007 ] Jun 1 06:25:01 fsbmc30p1 CRON[99327]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )) Jun 1 06:26:53 fsbmc30p1 CXLBLK[37882]: cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num = 0x607,for chunk->dev_name = /dev/sg2, chunk index = 0 Jun 1 06:26:53 fsbmc30p1 rsyslogd-2007: action 'action 10' suspended, next retry is Wed Jun 1 06:27:23 2016 [v8.16.0 try http://www.rsyslog.com/e/2007 ] Jun 1 06:26:53 fsbmc30p1 CXLBLK[37847]: cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num = 0x607,for chunk->dev_name = /dev/sg2, chunk index = 0 Jun 1 06:26:53 fsbmc30p1 CXLBLK[37927]: cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num = 0x607,for chunk->dev_name = /dev/sg2, chunk index = 0 Jun 1 06:26:59 fsbmc30p1 CXLBLK[37961]: cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num = 0x607,for chunk->dev_name = /dev/sg3, chunk index = 0 Jun 1 06:26:59 fsbmc30p1 CXLBLK[37954]: cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num = 0x607,for chunk->dev_name = /dev/sg2, chunk index = 0 Jun 1 06:26:59 fsbmc30p1 CXLBLK[37887]: cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num = 0x607,for chunk->dev_name = /dev/sg2, chunk index = 0 Jun 1 06:26:59 fsbmc30p1 kernel: [ 1378.248405] hrtimer: interrupt took 200250 ns sample from kern.log during fail: ================================= Jun 1 06:08:11 fsbmc30p1 kernel: [ 250.251041] nvidia-uvm: Loaded the UVM driver in lite mode, major device number 241 Jun 1 06:26:59 fsbmc30p1 kernel: [ 1378.248405] hrtimer: interrupt took 200250 ns Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.764382] hxesurelock[40392]: unhandled signal 11 at 0000000000000024 nip 00003fff84602978 lr 00003fff84602974 code 30001 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.868242] Unable to handle kernel paging request for data at address 0x0000000c Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.868599] Faulting instruction address: 0xc00000000035e2b0 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.868865] Oops: Kernel access of bad area, sig: 11 [#1] Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.868928] SMP NR_CPUS=2048 NUMA PowerNV Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.868992] Modules linked in: nvidia_uvm(POE) iptable_filter ip_tables x_tables nvidia(POE) ipmi_devintf joydev input_leds mac_hid opal_prd ofpart cmdlinepart powernv_flash mtd at24 ipmi_powernv ipmi_msghandler uio_pdrv_genirq uio ibmpowernv powernv_rng binfmt_misc nfsd ib_iser auth_rpcgss rdma_cm iw_cm ib_cm nfs_acl ib_sa ib_mad lockd ib_core grace ib_addr sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear mlx4_en hid_generic usbhid hid uas usb_storage cxlflash ast bnx2x i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm cxl vxlan mlx4_core ahci ip6_udp_tunnel udp_tunnel libahci mdio libcrc32c Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870299] CPU: 80 PID: 40392 Comm: hxesurelock Tainted: P OE 4.4.8c0ffee0+ #2 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870379] task: c000007935fe23a0 ti: c000007910810000 task.ti: c000007910810000 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870476] NIP: c00000000035e2b0 LR: c00000000035e280 CTR: 0000000000000000 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870552] REGS: c0000079108135e0 TRAP: 0300 Tainted: P OE (4.4.8c0ffee0+) Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870642] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28053988 XER: 00000000 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] CFAR: c000000000008468 DAR: 000000000000000c DSISR: 40000000 SOFTE: 1 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR00: c00000000035e280 c000007910813860 c000000001594600 0000000000000000 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR04: c000007823192400 000000000002574f 0000000000000001 0000000000000000 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR08: c0000079241b8a00 0000000000000000 00000000000044fb 65776f702f62696c Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR12: 2d656c3436637072 c00000000fb6f800 00000000464c457f 0000000000010c78 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR16: 0000000000000000 0000000000000039 d000000034fa04c5 0000000000010000 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR20: 00000000000000cd 0000000000000550 0000000000010000 00000000039e0000 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR24: 00003fffffffffff c000007910813af8 c000007823192600 c00000793f57b980 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR28: c00000793f573e80 00003fffffffffff 000000000000001f c000007926f29790 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872149] NIP [c00000000035e2b0] elf_core_dump+0xd60/0x1300 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872277] LR [c00000000035e280] elf_core_dump+0xd30/0x1300 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872351] Call Trace: Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872407] [c000007910813860] [c00000000035e280] elf_core_dump+0xd30/0x1300 (unreliable) Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872527] [c000007910813a60] [c00000000036898c] do_coredump+0xcec/0x11e0 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872625] [c000007910813c20] [c0000000000ce7a0] get_signal+0x540/0x7b0 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872705] [c000007910813d10] [c000000000017344] do_signal+0x54/0x2b0 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872785] [c000007910813e00] [c00000000001776c] do_notify_resume+0xbc/0xd0 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872877] [c000007910813e30] [c000000000009838] ret_from_except_lite+0x64/0x68 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872963] Instruction dump: Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.873004] 60000000 2fa30000 409effa8 e95f0050 39200000 794737e3 4082ffa4 e91f00a0 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.873148] 2fa80000 419e002c e92800f8 e9290000 <8129000c> 79279fe3 41820018 7948efe3 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.884655] ---[ end trace f8abb6e0d0322daa ]--- gsave info: ============== GSA Location: /gsa/ausgsa/projects/s/sift/hst/trial_data/Surelock/Ubuntu/flashgt/fsbmc30p1_ubuntu1604_FlashGT_bootme_test5/FAIL201606011024 <===== This is from RTC side description =====> See the Discussion field for the initial comments from CQ. </===== This is from RTC side description =====> ==== State: Open by: mpvageli on 02 June 2016 14:20:06 ==== Oops: Kernel access of bad area, sig: 11 [#1] # ipmitool -H 127.0.0.1 -I lanplus -U ADMIN -P admin fru list 47 Product Name : OpenPOWER Firmware Product Version : IBM-firestone-ibm-OP8_v1.7_1.62 Product Extra : hostboot-bc98d0b-1a29dff Product Extra : occ-0362706-16fdfa7 Product Extra : skiboot-5.1.13 Product Extra : hostboot-binaries-43d5a59 Product Extra : firestone-xml-e7b4fa2-c302f0e Product Extra : capp-ucode-105cb8f == Comment: #9 - VIPIN K. PARASHAR <[email protected]> - 2016-06-07 12:04:49 == root@fsbmc30p1:~# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04 LTS Release: 16.04 Codename: xenial root@fsbmc30p1:~# cat /etc/*release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=16.04 DISTRIB_CODENAME=xenial DISTRIB_DESCRIPTION="Ubuntu 16.04 LTS" NAME="Ubuntu" VERSION="16.04 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" UBUNTU_CODENAME=xenial root@fsbmc30p1:~# uname -a Linux fsbmc30p1 4.4.8c0ffee0+ #2 SMP Tue May 24 10:50:26 CDT 2016 ppc64le ppc64le ppc64le GNU/Linux root@fsbmc30p1:~# == Comment: #24 - VIPIN K. PARASHAR <[email protected]> - 2016-07-07 07:14:05 == From kernel logs =========== [ 7087.918089] device enP3p5s0f2 left promiscuous mode [ 8801.190528] cxlflash 0007:00:00.0: send_tmf: TMF timed out! [ 8806.190383] cxlflash 0007:00:00.0: send_tmf: TMF timed out! [ 8816.507485] hxesurelock[14180]: unhandled signal 11 at 0000000000000024 nip 00003fff852c2ee8 lr 00003fff852c2938 code 30001 [ 8816.511368] hxesurelock[13501]: unhandled signal 11 at 0000000000000024 nip 00003fff890b2ee8 lr 00003fff890b2938 code 30001 [ 8816.526807] Unable to handle kernel paging request for data at address 0x0000000c [ 8816.526928] Faulting instruction address: 0xc00000000035e2b0 [ 8816.530233] Unable to handle kernel paging request for data at address 0x0000000c [ 8816.530596] Faulting instruction address: 0xc00000000035e2b0 3f:mon> t [c000000686a13a60] c00000000036898c do_coredump+0xcec/0x11e0 [c000000686a13c20] c0000000000ce7a0 get_signal+0x540/0x7b0 [c000000686a13d10] c000000000017344 do_signal+0x54/0x2b0 [c000000686a13e00] c00000000001776c do_notify_resume+0xbc/0xd0 [c000000686a13e30] c000000000009838 ret_from_except_lite+0x64/0x68 --- Exception: 300 (Data Access) at 00003fff890b2ee8 SP (3fff83c2c490) is in userspace 3f:mon> r R00 = c00000000035e280 R16 = 0000000000000000 R01 = c000000686a13860 R17 = 0000000000000042 R02 = c000000001594600 R18 = d000000021b104fa R03 = 0000000000000000 R19 = 0000000000010000 R04 = c000002fb7463400 R20 = 00000000000000cd R05 = 00000000000001bf R21 = 0000000000000628 R06 = 0000000000000001 R22 = 0000000000010000 R07 = 0000000000000000 R23 = 0000000000250000 R08 = c00000281af21500 R24 = 00003fffffffffff R09 = 0000000000000000 R25 = c000000686a13af8 R10 = 00000000000044fb R26 = c000002fb7463800 R11 = 6c2d656c34366370 R27 = c000002ff0e05cc0 R12 = 756e672d78756e69 R28 = c000002ff0e05c40 R13 = c00000000fb65680 R29 = 00003fffffffffff R14 = 00000000464c457f R30 = 0000000000000016 R15 = 0000000000010e70 R31 = c000002fb94bd3b8 pc = c00000000035e2b0 elf_core_dump+0xd60/0x1300 cfar= c000000000008468 slb_miss_realmode+0x50/0x78 lr = c00000000035e280 elf_core_dump+0xd30/0x1300 msr = 9000000100009033 cr = 28053828 ctr = 0000000000000000 xer = 0000000000000000 trap = 300 dar = 000000000000000c dsisr = 40000000 3f:mon> hxesurelock process has segfaulted and kernel has crashed while dumping core. == Comment: #87 - Frederic Barrat <[email protected]> - 2017-02-21 11:50:40 == Fix is in kernel v4.10: bdecf76e319a29735d828575f4a9269f0e17c547 "cxl: Fix coredump generation when cxl_get_fd() is used" We'd like to have it backported to 16.10 and 16.04 LTS. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1667239/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp

