** Changed in: crash (Ubuntu Xenial)
     Assignee: Louis Bouchard (louis) => (unassigned)

** Changed in: makedumpfile (Ubuntu Xenial)
     Assignee: Louis Bouchard (louis) => (unassigned)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to crash in Ubuntu.
https://bugs.launchpad.net/bugs/1655625

Title:
  ISST-LTE:pVM:roselp4:ubuntu 16.04.2: vmcore cannot be analysed by
  crash

Status in The Ubuntu-power-systems project:
  Fix Committed
Status in crash package in Ubuntu:
  Confirmed
Status in makedumpfile package in Ubuntu:
  Confirmed
Status in crash source package in Xenial:
  Fix Released
Status in makedumpfile source package in Xenial:
  Fix Released

Bug description:
  [SRU justification]
  This fix is required to make the crash tool usable. It does also improve 
makedumpfile filtering of pages.

  [Impact]
  Kernel crashes cannot be analysed with the crash tool.
  makedumpfile incorrectly filter pages.

  [Fix]
  Cherry-pick upstream commits fixing those issues.

  [Test Case]
  Running crash tool on a kernel crash file will display something like :

  # crash -s usr/lib/debug/boot/vmlinux-4.8.0-34-generic
  crash: read error: kernel virtual address: ffffffff81e29ff0  type: 
"pv_init_ops"
  crash: this kernel may be configured with CONFIG_STRICT_DEVMEM, which
         renders /dev/mem unusable as a live memory source.
  crash: trying /proc/kcore as an alternative to /dev/mem

  crash: seek error: kernel virtual address: ffffffff81e29ff0  type: 
"pv_init_ops"
  crash: seek error: kernel virtual address: ffffffff82166130  type: 
"shadow_timekeeper xtime_sec"
  crash: seek error: kernel virtual address: ffffffff81e0d304  type: 
"init_uts_ns"
  crash: usr/lib/debug/boot/vmlinux-4.8.0-34-generic and 
/var/crash/201701191308/dump.201701191308 do not match!

  With the fix, the crash command will work as expected

  Running the crash tool on a vmcore file produced by makedumpfile may
  return :

  crash: page excluded: kernel virtual address: <> type:
  "fill_task_struct"

  [Regression]
  None expected as those modifications are part of the Zesty and upstream 
version.

  The makedumpfile patches are in Yakkety and Zesty 1.6.0 & after

  [Original description of the problem]
  vmcore captured by kdump cannot be opened with crash:

  % sudo crash -d1 /usr/lib/debug/boot/vmlinux-4.8.0-34-generic 
/var/crash/201612282137/dump.201612282137
  ... ...
  base kernel version: 0.8.0
  linux_banner:
  ????????
  crash: /usr/lib/debug/boot/vmlinux-4 and 
/var/crash/201612282137/dump.201612282137 do not match!

  Usage:

    crash [OPTION]... NAMELIST MEMORY-IMAGE[@ADDRESS]     (dumpfile form)
    crash [OPTION]... [NAMELIST]                          (live system form)

  Enter "crash -h" for details.

  Looks like the 'linux_banner' cannot be understood by crash.

  And when the vmcore was dumping, this message being showed:

  [  729.609196] kdump-tools[5192]: The kernel version is not supported.
  [  729.609447] kdump-tools[5192]: The makedumpfile operation may be 
incomplete.
  ---uname output---
  Linux roselp4 4.8.0-34-generic #36~16.04.1-Ubuntu SMP Wed Dec 21 18:53:20 UTC 
2016 ppc64le ppc64le ppc64le GNU/Linux

  Machine Type = lpar

  ---Debugger---
  A debugger is not configured

  ---Steps to Reproduce---
   1. config kdump
  2. trigger kdump
  3. analyse vmcore with crash

  Userspace tool common name: crash/makedumpfile

  The userspace tool has the following bit modes: 64-bit

  Userspace rpm: makedumpfile 1.5.9-5ubuntu0.3/crash 7.1.4-1ubuntu4

  Userspace tool obtained from project website:  na

  *Additional Instructions for Ping Tian Han/pt...@cn.ibm.com:
  -Post a private note with access information to the machine that the bug is 
occuring on.
  -Attach ltrace and strace of userspace application.

  xtime timespec.tv_sec: 586481e8: Wed Dec 28 21:24:24 2016
  utsname:
       sysname: Linux
      nodename: boblp1
       release: 4.8.0-32-generic
       version: #34~16.04.1-Ubuntu SMP Tue Dec 13 17:01:57 UTC 2016
       machine: ppc64le
    domainname: (none)
  base kernel version: 4.8.0
  verify_namelist:
  dumpfile /proc/version:
  Linux version 4.8.0-32-generic (buildd@bos01-ppc64el-001) (gcc version 5.4.0 
20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #34~16.04.1-Ubuntu SMP Tue Dec 
13 17:01:57 UTC 2016 (Ubuntu 4.8.0-32.34~16.04.1-generic 4.8.11)
  /usr/lib/debug/boot/vmlinux-4.8.0-32-generic:
  Linux version 4.8.0-32-generic (buildd@bos01-ppc64el-001) (gcc version 5.4.0 
20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #34~16.04.1-Ubuntu SMP Tue Dec 
13 17:01:57 UTC 2016 (Ubuntu 4.8.0-32.34~16.04.1-generic 4.8.11)

  hypervisor: (undetermined)
  crash: per_cpu_symbol_search(per_cpu__tvec_bases): NULL
  ppc64_vmemmap_init: vmemmap base: f000000000000000

  crash: PPC64: cannot find 'cpu_possible_map', 'cpu_present_map',
  'cpu_online_map' or 'cpu_active_map' symbols

  root@boblp1:/usr/lib/debug/boot# uname -a
  Linux boblp1 4.8.0-32-generic #34~16.04.1-Ubuntu SMP Tue Dec 13 17:01:57 UTC 
2016 ppc64le ppc64le ppc64le GNU/Linux
  root@boblp1:/usr/lib/debug/boot#

  1. Missing v4.8 support related patches in crash tool

     commit 098cdab16dfa6a85e9dad2cad604dee14ee15f66
     Author: Dave Anderson <ander...@redhat.com>
     Date:   Fri Feb 12 14:32:53 2016 -0500

      Fix for the changes made to the kernel module structure introduced by
      this kernel commit for Linux 4.5 and later kernels:

        commit 8244062ef1e54502ef55f54cced659913f244c3e
        modules: fix longstanding /proc/kallsyms vs module insertion race.

      Without the patch, the crash session fails during initialization
      with the error message: "crash: invalid structure member offset:
      module_num_symtab".
      (ander...@redhat.com)

     commit 6f1f78e33474d00d5f261d7ed9d835c558b34d61
     Author: Dave Anderson <ander...@redhat.com>
     Date:   Wed Jan 20 09:56:36 2016 -0500

      Fix for the changes made to the kernel module structure introduced by
      this kernel commit for Linux 4.5 and later kernels:

        commit 7523e4dc5057e157212b4741abd6256e03404cf1
        module: use a structure to encapsulate layout.

      Without the patch, the crash session fails during initialization
      with the error message: "crash: invalid structure member offset:
      module_init_text_size".
      (seb...@linux.vnet.ibm.com)

     commit 1e92f9fad3a7e3042b16996306cb2335760ef8c8
     Author: Dave Anderson <ander...@redhat.com>
     Date:   Mon Feb 1 16:10:49 2016 -0500

      Fix for the replacements made to the kernel's cpu_possible_mask,
      cpu_online_mask, cpu_present_mask and cpu_active_mask symbols in
      this kernel commit for Linux 4.5 and later kernels:

        commit 5aec01b834fd6f8ca49d1aeede665b950d0c148e
        kernel/cpu.c: eliminate cpu_*_mask

      Without the patch, behavior is architecture-specific, dependent upon
      whether the cpu mask values are used to calculate the number of cpus.
      For example, ARM64 crash sessions fail during session initialization
      with the error message "crash: zero-size memory allocation! (called
      from <address>)", whereas X86_64 sessions come up normally, but
      cpu mask values of zero are stored internally.
      (ander...@redhat.com)

     commit 182914debbb9a2671ef644027fedd339aa9c80e0
     Author: Dave Anderson <ander...@redhat.com>
     Date:   Fri Sep 23 09:09:15 2016 -0400

      With the introduction of radix MMU in Power ISA 3.0, there are
      changes in kernel page table management accommodating it.  This patch
      series makes appropriate changes here to work for such kernels.
      Also, this series fixes a few bugs along the way:

        ppc64: fix vtop page translation for 4K pages
        ppc64: Use kernel terminology for each level in 4-level page table
        ppc64/book3s: address changes in kernel v4.5
        ppc64/book3s: address change in page flags for PowerISA v3.0
        ppc64: use physical addresses and unfold pud for 64K page size
        ppc64/book3s: support big endian Linux page tables

      The patches are needed for Linux v4.5 and later kernels on all
      ppc64 hardware.
      commit 8ceb1ac628bf6a0a7f0bbfff030ec93081bca4cd
     Author: Dave Anderson <ander...@redhat.com>
     Date:   Mon May 23 11:23:01 2016 -0400

      Fix for Linux commit 0139aa7b7fa12ceef095d99dc36606a5b10ab83a, which
      renamed the page._count member to page._refcount.  Without the patch,
      certain "kmem" commands fail with the "kmem: invalid structure member
      offset: page_count".
      (ander...@redhat.com)

     commit 7136bf8495948cb059e5595b8503f8ae37019fa1
     Author: Dave Anderson <ander...@redhat.com>
     Date:   Thu May 19 14:01:19 2016 -0400

      Fix for Linux commit edf14cdbf9a0e5ab52698ca66d07a76ade0d5c46, which
      has appended a NULL entry as the final member of the pageflag_names[]
      array.  Without the patch, a message that indicates "crash: failed to
      read pageflag_names entry" is displayed during session initialization
      in Linux 4.6 kernels.
      (andrej.skvort...@gmail.com)

  2. The following makedumpfile commits are needed:

     commit 5bc1f520cc7ab6e18abdd5af21c80ecda6339eb5
     Author: Atsushi Kumagai <ats-kuma...@wm.jp.nec.com>
     Date:   Tue Jan 26 10:11:33 2016 +0900

      [PATCH] Looking for page.compound_order/compound_dtor to exclude
  hugepages

      * Required for kernel 4.4

      Due to some changes in struct page, hugepages wouldn't be removed on
      linux 4.4. makedumpfile reads page.lru.prev to get "order" (number of 
hugepages)
      and page.lru.next to get "dtor" (destructor for hugepages) to detect 
hugepages,
      but the offsets of the two was changed in linux 4.4.

        kernel version |      where is order       |       where is dtor
       ----------------+---------------------------+---------------------------
              - v3.19  |         lru.prev          |          lru.next
         v4.0 - v4.3   | compound_order(=lru.prev) | compound_dtor(=lru.next)
         v4.4 -        | compound_order            | compound_dtor

      As above, OFFSET(page.compound_order) and OFFSET(page.compound_dtor) are
      definitely necessary in VMCOREINFO on linux 4.4 and later.

      Further, the content of page.compound_dtor was changed from direct address
      of dtor to the ID of it in linux 4.4.

      Signed-off-by: Atsushi Kumagai <ats-kuma...@wm.jp.nec.com>

     commit 13b4233e91a9d5aa14c4b0643af36cbc29b9fa7a
     Author: Atsushi Kumagai <ats-kuma...@wm.jp.nec.com>
     Date:   Wed Feb 24 17:09:44 2016 +0900

      [PATCH] Skip examining compound tail pages

      * Required for kernel 4.5

      For filtering user pages, we check whether each page's
      page->mapping have PAGE_MAPPING_ANON bit.
      However, unexcludable compound tail pages can have
      PAGE_MAPPING_ANON since kernel 4.5, they can be excluded
      as user page wrong.

      Now, we don't need to check compound tail pages because
      excludable compound pages must be excluded at a time by
      exclude_range() when the corresponding head page is checked.
      So just skipping tail pages can avoid wrong filtering.

      Signed-off-by: Atsushi Kumagai <ats-kuma...@wm.jp.nec.com>

  3. The linux-image dbgsym version installed must be pulled from a different 
repo
     instead of the one meant for 16.04.2 because the gcc version of kernel
     image (/boot/vmlinux-4.8.0-34-generic) and the vmlinux with debug
     symbols(usr/lib/debug/boot/vmlinux-4.8.0-34-generic) don't match.

     Please use the following repos

       sudo tee /etc/apt/sources.list.d/ddebs.list << EOF
       deb http://ddebs.ubuntu.com/ $(lsb_release -cs)          main restricted 
universe multiverse
       deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-security main restricted 
universe multiverse
       deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-updates  main restricted 
universe multiverse
       deb http://ddebs.ubuntu.com/ $(lsb_release -cs)-proposed main restricted 
universe multiverse
       EOF

     to install linux-image-4.8.0-34-generic-dbgsym package.

  Thanks

  [snip]

  >
  > 3. The linux-image dbgsym version installed must be pulled from a different
  > repo

  s/must be pulled/must have been pulled/

  Applied crash utility's missing patches on top of
  crash-7.1.4-1ubuntu4 and makedumpfile tool's missing patches on top of
  makedumpfile-1.5.9-5ubuntu0.3. Did some sanity testing of the
  patched binaries. The binaries were working as expected.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1655625/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to