** Changed in: kexec-tools (Ubuntu)
     Assignee: Taco Screen team (taco-screen-team) => dann frazier (dannf)

** Changed in: kexec-tools (Ubuntu)
       Status: New => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to kexec-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1661168

Title:
  In Ubuntu16.10: Kdump stuck in  boot for longer time need to force
  reboot via HMC in 32TB Brazos System

Status in kexec-tools package in Ubuntu:
  In Progress
Status in kexec-tools source package in Yakkety:
  New

Bug description:
  Problem Description
  ===========================
    In Ubuntu16.10 tried  kdump in Brazos system (32TB Memory and 192 core). 
when trigger panic  kdump process  stuck in boot process  need to do force 
reboot .After reboot system captured vmcore-incomplete.

  Reproducible Step:
  ======================
  1- Install Ubuntu16.10
  2- boot system with 31TB and 192 Core 
  3- configure kdump in system 
  4- verify kdump in system that it is running 
  5- Trigger panic in system

  Actual Result
  --------------------------
  kdump process  stuck in boot process  need to do force reboot

  Expected Result 
  -----------------------------
  Kdump will proceed and vmcore captured  successfully.

  LOG:

  root@ltc-brazos1:~# cat /proc/cmdline 
  BOOT_IMAGE=/boot/vmlinux-4.4.0-30-generic 
root=UUID=516c4b1b-6700-4b55-bd37-d61c4c5af6af ro quiet splash crashkernel=4096M
  root@ltc-brazos1:~# kdump-config show
  DUMP_MODE:        kdump
  USE_KDUMP:        1
  KDUMP_SYSCTL:     kernel.panic_on_oops=1
  KDUMP_COREDIR:    /var/crash
  crashkernel addr: 
     /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinux-4.4.0-30-generic
  kdump initrd: 
     /var/lib/kdump/initrd.img: symbolic link to 
/var/lib/kdump/initrd.img-4.4.0-30-generic
  current state:    ready to kdump

  kexec command:
    /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinux-4.4.0-30-generic 
root=UUID=516c4b1b-6700-4b55-bd37-d61c4c5af6af ro quiet splash irqpoll 
nr_cpus=1 nousb systemd.unit=kdump-tools.service" 
--initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
  root@ltc-brazos1:~# 
  root@ltc-brazos1:~# dpkg -l | grep kdump         
  ii  kdump-tools                        1:1.6.0-2                         all  
        scripts and tools for automating kdump (Linux crash dumps)
  root@ltc-brazos1:~# 
  root@ltc-brazos1:~# echo c > /proc/sysrq-trigger 

  
  ltc-brazos1 login: [  416.229464] sysrq: SysRq : Trigger a crash              
                                                                                
              
  [  416.229496] Unable to handle kernel paging request for data at address 
0x00000000                                                                      
                  
  [  416.229502] Faulting instruction address: 0xc000000000670014               
                                                                                
              
  [  416.229508] Oops: Kernel access of bad area, sig: 11 [#1]                  
                                                                                
              
  [  416.229511] SMP NR_CPUS=2048 NUMA pSeries                                  
                                                                                
              
  [  416.229517] Modules linked in: pseries_rng btrfs xor raid6_pq rtc_generic 
sunrpc autofs4 ses enclosure ipr                                                
               
  [  416.229532] CPU: 65 PID: 404785 Comm: bash Not tainted 4.4.0-30-generic 
#49-Ubuntu                                                                      
                 
  [  416.229537] task: c00001f9d583c8e0 ti: c00001fa13cd8000 task.ti: 
c00001fa13cd8000                                                                
                        
  [  416.229543] NIP: c000000000670014 LR: c0000000006710c8 CTR: 
c00000000066ffe0                                                                
                             
  [  416.229548] REGS: c00001fa13cdb990 TRAP: 0300   Not tainted  
(4.4.0-30-generic)                                                              
                            
  [  416.229552] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28242222  
XER: 00000001                                                                   
                 
  [  416.229565] CFAR: c000000000008468 DAR: 0000000000000000 DSISR: 42000000 
SOFTE: 1                                                                        
                
  GPR00: c0000000006710c8 c00001fa13cdbc10 c0000000015b5d00 0000000000000063    
                                                                                
              
  GPR04: c00001fab9049c50 c00001fab905b4e0 c0001f3fff3d0000 0000000000000313    
                                                                                
              
  GPR08: 0000000000000007 0000000000000001 0000000000000000 c0001f3fff3dec68    
                                                                                
              
  GPR12: c00000000066ffe0 c000000007546980 ffffffffffffffff 0000000022000000 
  GPR16: 0000000010170dc8 00000100174901d8 0000000010140f58 00000000100c7570 
  GPR20: 0000000000000000 000000001017dd58 0000000010153618 000000001017b608 
  GPR24: 00003ffff8966c94 0000000000000001 c0000000014f8e58 0000000000000004 
  GPR28: c0000000014f9218 0000000000000063 c0000000014b11dc 0000000000000000 
  [  416.229631] NIP [c000000000670014] sysrq_handle_crash+0x34/0x50
  [  416.229636] LR [c0000000006710c8] __handle_sysrq+0xe8/0x270
  [  416.229640] Call Trace:
  [  416.229645] [c00001fa13cdbc10] [c000000000e08f28] 
_fw_tigon_tg3_bin_name+0x2ce58/0x342b0 (unreliable)
  [  416.229652] [c00001fa13cdbc30] [c0000000006710c8] __handle_sysrq+0xe8/0x270
  [  416.229658] [c00001fa13cdbcd0] [c000000000671868] 
write_sysrq_trigger+0x78/0xa0
  [  416.229666] [c00001fa13cdbd00] [c00000000037ae30] proc_reg_write+0xb0/0x110
  [  416.229673] [c00001fa13cdbd50] [c0000000002e186c] __vfs_write+0x6c/0xe0
  [  416.229678] [c00001fa13cdbd90] [c0000000002e25a0] vfs_write+0xc0/0x230
  [  416.229684] [c00001fa13cdbde0] [c0000000002e35dc] SyS_write+0x6c/0x110
  [  416.229690] [c00001fa13cdbe30] [c000000000009204] system_call+0x38/0xb4
  [  416.229695] Instruction dump:
  [  416.229698] 38425d20 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d220019 
394931e4 
  [  416.229707] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 
e8010010 7c0803a6 
  [  416.229717] ---[ end trace 16e5fbbf7faa7340 ]---
  [  416.232059] 
  [  416.232086] Sending IPI to other CPUs
  [  416.242558] IPI complete
  [  [  416.229695] Instruction dump:                                           
                                                                                
                 
  [  416.229698] 38425d20 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d220019 
394931e4                                                                        
              
  [  416.229707] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 
e8010010 7c0803a6                                                               
                     
  [  416.229717] ---[ end trace 16e5fbbf7faa7340 ]---                           
                                                                                
              
  [  416.232059]                                                                
                                                                                
              
  [  416.232086] Sending IPI to other CPUs                                      
                                                                                
              
  [  416.242558] IPI complete                                                   
                                                                                
              
  I'm in purgatory                                                              
                                                                                
              
   -> smp_release_cpus()                                                        
                                                                                
              
  spinning_secondaries = 1528                                                   
                                                                                
              
   <- smp_release_cpus()                                                        
                                                                                
              
   <- setup_system()                                                            
                                                                                
              
  [    1.146155] sd 0:2:1:0: [sdb] Assuming drive cache: write through          
                                                                                
              
  [    1.154176] sd 0:2:0:0: [sda] Assuming drive cache: write through          
                                                                                
              
  /dev/sdb2: recovering journal                                                 
                                                                                
              
  /dev/sdb2: clean, 69482/136331264 files, 9047821/545318400 blocks  

  
---------------------------------------------------------------------------------------
  
--------------------------------------------------------------------------------------
  tu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 1
 6.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101
 ;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  
.1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.
   .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  
.1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.

  
  
---------------------------------------------------------------------------------------
  
--------------------------------------------------------------------------------------
  
---------------------------------------------------------------------------------------
  
--------------------------------------------------------------------------------------

  after force reboot

  root@ltc-brazos1:/var/crash# ls
  201607161510  kexec_cmd
  root@ltc-brazos1:/var/crash# cd 201607161510/
  root@ltc-brazos1:/var/crash/201607161510# ls
  vmcore-incomplete
  root@ltc-brazos1:

  Note :  waited for Kdump process more than 2 Hour .

  Regards
  Praveen

  == Comment: #12 - Vaishnavi Bhat <vaish...@in.ibm.com> - 2016-09-16 02:40:20 
==
  root@ltc-brazos1:~# kdump-config show 
  DUMP_MODE:        kdump
  USE_KDUMP:        1
  KDUMP_SYSCTL:     kernel.panic_on_oops=1
  KDUMP_COREDIR:    /var/crash
  crashkernel addr: 
     /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinux-4.4.0-9136-generic
  kdump initrd: 
     /var/lib/kdump/initrd.img: symbolic link to 
/var/lib/kdump/initrd.img-4.4.0-9136-generic
  current state:    ready to kdump

  kexec command:
    /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinux-4.4.0-9136-generic 
root=UUID=bfdd4041-1b2f-42b1-b202-2c09f781bbcc ro quiet splash irqpoll 
nr_cpus=1 nousb systemd.unit=kdump-tools.service" 
--initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz

  root@ltc-brazos1:~# cat /proc/cmdline 
  BOOT_IMAGE=/boot/vmlinux-4.4.0-9136-generic 
root=UUID=bfdd4041-1b2f-42b1-b202-2c09f781bbcc ro crashkernel=4096M quiet 
splash crashkernel=4096M

  root@ltc-brazos1:~# dmesg  | grep -i crash
  [    0.000000] Reserving 4096MB of memory at 128MB for crashkernel (System 
RAM: 31744000MB)
  [    0.000000] Kernel command line: 
BOOT_IMAGE=/boot/vmlinux-4.4.0-9136-generic 
root=UUID=bfdd4041-1b2f-42b1-b202-2c09f781bbcc ro crashkernel=4096M quiet 
splash crashkernel=4096M

  == Comment: #26 - Hari Krishna Bathini <hbath...@in.ibm.com> - 2017-02-01 
02:02:36 ==
  The following kexec-tools commit is needed to fix this issue:

    commit f63d8530b9b6a2d7e79b946e326e5a2197eb8f87
    Author: Petr Tesarik <ptesa...@suse.com>
    Date:   Thu Jan 19 18:37:09 2017 +0100

      ppc64: Reduce number of ELF LOAD segments
      
      The number of program header table entries (e_phnum) is an Elf64_Half,
      which is a 16-bit entity, i.e. the limit is 65534 entries (one entry is
      reserved for NOTE). This is a hard limit, defined by the ELF standard.
      It is possible that more LMBs (Logical Memory Blocks) are needed to
      represent all RAM on some machines, and this field overflows, causing
      an incomplete /proc/vmcore file.
      
      This has actually happened on a machine with 31TB of RAM and an LMB size
      of 256MB.
      
      However, since there is usually no memory hole between adjacent LMBs, the
      map can be "compressed", combining multiple adjacent into a single LOAD
      segment.
      
      Signed-off-by: Petr Tesarik <ptesa...@suse.com>
      Signed-off-by: Simon Horman <ho...@verge.net.au>

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/1661168/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to