------- Comment From craig.w...@us.ibm.com 2016-11-14 11:17 EDT-------
dougmill-ibm commented 6 days ago
It appears that one of NVMe drives failed to function correctly during 
boot/probe. The 'lspci' output does not show that drive, which means it was 
taken after the failure but before recovery (reboot). It means we don't have 
full info on that drive.

>From a high-level look at the code, the vfree error seems to be for
freeing the PCI BAR. The code path through the initialization might
allow for a failure before/during allocation of BAR and not account for
that during device removal. The stack trace is not much help because the
"remove dead controller" routine is invoked as "work" on a kthread, and
so we do not have the stack trace of the thread that actually
encountered the original failure (I/O timeout).

So, there are two problems shown here. One is the vfree WARNING which
indicates that the error paths are not quite right. The other is why the
NVMe drive failed to function correctly - which is the primary issue for
this test case. NOTE: the vfree message is only a WARNING and should not
cause any sort of permanent problem with the running kernel.

------- Comment From craig.w...@us.ibm.com 2016-11-14 16:42 EDT-------
Debug continues.

** Tags added: architecture-ppc64le bugnameltc-148618 severity-critical
targetmilestone-inin1604

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1639920

Title:
  NVMe detection failed during bootup

Status in linux package in Ubuntu:
  Triaged

Bug description:
  I've been running an on-off test on a couple of Power8 systems and
  have been getting a failure of detection on the NVMe drives on Ubuntu
  16.04.1 only.  I've ran the same test on RHEL 7.2 and have not
  encountered this proble. Once the problem occurs the OS will stop to
  boot up and a message appears:

  Welcome to emergency mode! After logging in, type "journalctl -xb" to view 
system logs, "systemctl reboot" to reboot, "systemctl default" or ^D to try 
again to boot into default mode.
  Give root password for maintenance (or press Control-D to continue):

  ProblemType: Crash
  DistroRelease: Ubuntu 16.04
  Package: apport 2.20.1-0ubuntu2.1
  ProcVersionSignature: Ubuntu 4.4.0-45.66-generic 4.4.21
  Uname: Linux 4.4.0-45-generic ppc64le
  ApportVersion: 2.20.1-0ubuntu2.1
  Architecture: ppc64el
  CrashReports:
   640:0:116:12295:2016-11-07 11:11:27.995107650 -0800:2016-11-07 
11:31:31.733627559 -0800:/var/crash/_usr_bin_apport-bug.0.crash
   644:0:116:0:2016-11-07 11:11:28.931104586 -0800:2016-11-07 
11:11:28.931104586 -0800:/var/crash/_usr_bin_apport-cli.0.None.hanging
  Date: Mon Nov  7 11:11:28 2016
  ExecutablePath: /usr/bin/apport-bug
  InstallationDate: Installed on 2016-11-05 (2 days ago)
  InstallationMedia: Ubuntu-Server 16.04.1 LTS "Xenial Xerus" - Release ppc64el 
(20160719)
  InterpreterPath: /usr/bin/python3.5
  PackageArchitecture: all
  ProcCmdline: /usr/bin/python3 /usr/bin/apport-cli --hanging
  ProcEnviron:
   TERM=linux
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcLoadAvg: 0.30 0.41 0.19 1/1132 2428
  ProcLocks:
   
  ProcSwaps:
   Filename                             Type            Size    Used    Priority
   /dev/sda3                               partition    157914048       0       
-1
  ProcVersion: Linux version 4.4.0-45-generic (buildd@bos01-ppc64el-030) (gcc 
version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.2) ) #66-Ubuntu SMP Wed 
Oct 19 14:13:11 UTC 2016
  PythonArgs: ['/usr/bin/ubuntu-bug', '--hanging']
  SourcePackage: apport
  Title: apport-bug crashed with TypeError in run_hang(): int() argument must 
be a string, a bytes-like object or a number, not 'NoneType'
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  cpu_cores: Number of cores present = 20
  cpu_coreson: Number of cores online = 20
  cpu_smt: SMT=8
  mtime.conffile..etc.apport.crashdb.conf: 2016-11-07T10:38:18.528739

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1639920/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to