------- Comment From craig.w...@us.ibm.com 2016-11-14 11:17 EDT------- dougmill-ibm commented 6 days ago It appears that one of NVMe drives failed to function correctly during boot/probe. The 'lspci' output does not show that drive, which means it was taken after the failure but before recovery (reboot). It means we don't have full info on that drive.
>From a high-level look at the code, the vfree error seems to be for freeing the PCI BAR. The code path through the initialization might allow for a failure before/during allocation of BAR and not account for that during device removal. The stack trace is not much help because the "remove dead controller" routine is invoked as "work" on a kthread, and so we do not have the stack trace of the thread that actually encountered the original failure (I/O timeout). So, there are two problems shown here. One is the vfree WARNING which indicates that the error paths are not quite right. The other is why the NVMe drive failed to function correctly - which is the primary issue for this test case. NOTE: the vfree message is only a WARNING and should not cause any sort of permanent problem with the running kernel. ------- Comment From craig.w...@us.ibm.com 2016-11-14 16:42 EDT------- Debug continues. ** Tags added: architecture-ppc64le bugnameltc-148618 severity-critical targetmilestone-inin1604 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1639920 Title: NVMe detection failed during bootup Status in linux package in Ubuntu: Triaged Bug description: I've been running an on-off test on a couple of Power8 systems and have been getting a failure of detection on the NVMe drives on Ubuntu 16.04.1 only. I've ran the same test on RHEL 7.2 and have not encountered this proble. Once the problem occurs the OS will stop to boot up and a message appears: Welcome to emergency mode! After logging in, type "journalctl -xb" to view system logs, "systemctl reboot" to reboot, "systemctl default" or ^D to try again to boot into default mode. Give root password for maintenance (or press Control-D to continue): ProblemType: Crash DistroRelease: Ubuntu 16.04 Package: apport 2.20.1-0ubuntu2.1 ProcVersionSignature: Ubuntu 4.4.0-45.66-generic 4.4.21 Uname: Linux 4.4.0-45-generic ppc64le ApportVersion: 2.20.1-0ubuntu2.1 Architecture: ppc64el CrashReports: 640:0:116:12295:2016-11-07 11:11:27.995107650 -0800:2016-11-07 11:31:31.733627559 -0800:/var/crash/_usr_bin_apport-bug.0.crash 644:0:116:0:2016-11-07 11:11:28.931104586 -0800:2016-11-07 11:11:28.931104586 -0800:/var/crash/_usr_bin_apport-cli.0.None.hanging Date: Mon Nov 7 11:11:28 2016 ExecutablePath: /usr/bin/apport-bug InstallationDate: Installed on 2016-11-05 (2 days ago) InstallationMedia: Ubuntu-Server 16.04.1 LTS "Xenial Xerus" - Release ppc64el (20160719) InterpreterPath: /usr/bin/python3.5 PackageArchitecture: all ProcCmdline: /usr/bin/python3 /usr/bin/apport-cli --hanging ProcEnviron: TERM=linux PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcLoadAvg: 0.30 0.41 0.19 1/1132 2428 ProcLocks: ProcSwaps: Filename Type Size Used Priority /dev/sda3 partition 157914048 0 -1 ProcVersion: Linux version 4.4.0-45-generic (buildd@bos01-ppc64el-030) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.2) ) #66-Ubuntu SMP Wed Oct 19 14:13:11 UTC 2016 PythonArgs: ['/usr/bin/ubuntu-bug', '--hanging'] SourcePackage: apport Title: apport-bug crashed with TypeError in run_hang(): int() argument must be a string, a bytes-like object or a number, not 'NoneType' UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: cpu_cores: Number of cores present = 20 cpu_coreson: Number of cores online = 20 cpu_smt: SMT=8 mtime.conffile..etc.apport.crashdb.conf: 2016-11-07T10:38:18.528739 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1639920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp