Hello Dave,

I got a kernel freeze yesterday and am able to successfully open the memory
image using crash utility.

crash> sys

      KERNEL: ./usr/lib/debug/usr/lib/modules/4.14.19-coreos/vmlinux

    DUMPFILE: gt-Server02-gmt-612746ca.vmss

        CPUS: 70

        DATE: Wed Feb 21 14:53:20 2018

      UPTIME: 1 days, 11:52:25

LOAD AVERAGE: 70.70, 30.98, 12.88

       TASKS: 2312

    NODENAME: gt-Server02-gmt.com

     RELEASE: 4.14.19-coreos

     VERSION: #1 SMP Wed Feb 14 03:18:05 UTC 2018

     MACHINE: x86_64  (2094 Mhz)

      MEMORY: 60 GB

       PANIC: ""

crash>

Could you please guide me about couple of things I should check in case of
a kernel freeze before diving in deep to find the root cause ?

Thank you,
Eshak

On Wed, Feb 7, 2018 at 7:12 PM, Eshak <tmdes...@gmail.com> wrote:

> Thank you for the quick info Dave.
>
> I'll deploy the main node with 'nokaslr' boot option and wait for a VM
> freeze.
>
> -Eshak
>
> On Wed, Feb 7, 2018 at 6:45 PM, anderson <ander...@prospeed.net> wrote:
>
>>
>>
>>
>>
>> Sent from my Verizon, Samsung Galaxy smartphone
>>
>> -------- Original message --------
>> From: Eshak <tmdes...@gmail.com>
>> Date: 2/7/18 9:34 PM (GMT-05:00)
>> To: "Discussion list for crash utility usage, maintenance and
>> development" <crash-utility@redhat.com>
>> Subject: Re: [Crash-utility] linux_banner has garbage
>>
>> Hi Dave,
>>
>> In a test system I have booted the kernel with 'nokaslr' option. While
>> trying to check phys_base and KASLR:
>>
>> crash> help -m |grep phys_base
>>
>>                 phys_base: 0
>>
>>      text hit rate: 66% (5171 of 7801)
>>
>> crash> help -k | grep relocate
>>
>>       relocate: 0  (KASLR offset: 0 / 0MB)
>>
>>      text hit rate: 66% (5171 of 7801)
>>
>> crash>
>>
>> I'm not sure if phys_base can be 0.
>>
>> Question: Are these values fine in order to read memory images by
>> specifying --phys_base=0 after booting main machine with 'nokaslr' option ?
>>
>> Yes, but since phys_base defaults to 0,
>>  the --machdep argument wouldn't be necessary.
>>
>> Dave
>>
>>
>>
>> Thank you,
>> Eshak
>>
>> On Wed, Feb 7, 2018 at 10:49 AM, Dave Anderson <ander...@redhat.com>
>> wrote:
>>
>>>
>>>
>>> ----- Original Message -----
>>> > Hi Dave,
>>> >
>>> > Thanks for the info.
>>> > I've installed 7.2.0-1.fc28 and was able to run crash on live system.
>>> >
>>> > Unfortunately, KASLR is enabled.
>>>
>>> Yes, I'm afraid that is unfortunate.  I don't know how you can determine
>>> what the KASLR offset is, and without that, the dumpfile is pretty
>>> much useless.
>>>
>>> The best thing you can do is to prepare for the *next* crash by stashing
>>> the phys_offset and KASLR offset values.  You also can boot the kernel
>>> with
>>> "nokaslr" on the boot command line.
>>>
>>> Dave
>>>
>>>
>>>
>>>
>>> >
>>> >
>>> > text hit rate: 66% (5171 of 7801)
>>> >
>>> > help -m |grep phys_base
>>> >
>>> > phys_base: 10d000000
>>> >
>>> > text hit rate: 66% (5171 of 7801)
>>> >
>>> > help -k | grep relocate
>>> >
>>> > relocate: ffffffffe1000000 (KASLR offset: 1f000000 / 496MB)
>>> >
>>> > text hit rate: 66% (5171 of 7801)
>>> > Is there any other info I can get from the vmem/vmss file like
>>> processes
>>> > running at the time or task blocked on I/O or anything ?
>>> >
>>> > Thank you,
>>> > Eshak
>>> >
>>> > On Wed, Feb 7, 2018 at 6:28 AM, Dave Anderson < ander...@redhat.com >
>>> wrote:
>>> >
>>> >
>>> >
>>> >
>>> > ----- Original Message -----
>>> > > That's fixed upstream. You'll have to download the crash sources from
>>> > > github
>>> > > and build the latest and greatest.
>>> >
>>> > It's possible that you might be able to run the Fedora 28 rawhide
>>> version
>>> > here:
>>> >
>>> > Information for build crash-7.2.0-1.fc28
>>> > https://koji.fedoraproject.org/koji/buildinfo?buildID=978501
>>> >
>>> > That version has the fix for the init_level4_pgt issue. I'm not sure
>>> > whether you may run into anything else.
>>> >
>>> > Dave
>>> >
>>> >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > Sent from my Verizon, Samsung Galaxy smartphone
>>> > >
>>> > > -------- Original message --------
>>> > > From: Eshak < tmdes...@gmail.com >
>>> > > Date: 2/6/18 9:27 PM (GMT-05:00)
>>> > > To: "Discussion list for crash utility usage, maintenance and
>>> development"
>>> > > < crash-utility@redhat.com >
>>> > > Subject: Re: [Crash-utility] linux_banner has garbage
>>> > >
>>> > > Hi Dave,
>>> > >
>>> > > I have /proc/kcore. But I'm getting 'cannot resolve 'init_level4_pgt'
>>> > > error.
>>> > >
>>> > >
>>> > >
>>> > > [root@gt-Server2-gmt proc]# crash
>>> > > /home/mfusion/vmem_vmss_jan26/usr/lib/debug/usr/lib/modules/
>>> 4.14.11-coreos/vmlinux
>>> > > /proc/kcore
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > crash 7.1.9-3.fc27
>>> > >
>>> > > Copyright (C) 2002-2016 Red Hat, Inc.
>>> > >
>>> > > Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
>>> > >
>>> > > Copyright (C) 1999-2006 Hewlett-Packard Co
>>> > >
>>> > > Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
>>> > >
>>> > > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
>>> > >
>>> > > Copyright (C) 2005, 2011 NEC Corporation
>>> > >
>>> > > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
>>> > >
>>> > > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
>>> > >
>>> > > This program is free software, covered by the GNU General Public
>>> License,
>>> > >
>>> > > and you are welcome to change it and/or distribute copies of it under
>>> > >
>>> > > certain conditions. Enter "help copying" to see the conditions.
>>> > >
>>> > > This program has absolutely no warranty. Enter "help warranty" for
>>> details.
>>> > >
>>> > >
>>> > >
>>> > > crash: /dev/tty: No such device or address
>>> > >
>>> > > NOTE: stdin: not a tty
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > GNU gdb (GDB) 7.6
>>> > >
>>> > > Copyright (C) 2013 Free Software Foundation, Inc.
>>> > >
>>> > > License GPLv3+: GNU GPL version 3 or later <
>>> > > http://gnu.org/licenses/gpl.html
>>> > > >
>>> > >
>>> > > This is free software: you are free to change and redistribute it.
>>> > >
>>> > > There is NO WARRANTY, to the extent permitted by law. Type "show
>>> copying"
>>> > >
>>> > > and "show warranty" for details.
>>> > >
>>> > > This GDB was configured as "x86_64-unknown-linux-gnu"...
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > WARNING: kernel relocated [496MB]: patching 69420 gdb minimal_symbol
>>> values
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > crash: cannot resolve "init_level4_pgt"
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > [root@gt-Server2-gmt proc]#
>>> > > But I believe this is fixed in crash 7.2. I have raised one issue
>>> against
>>> > > CoreOS to make crash 7.2 to be available in toolbox packages(
>>> > > https://github.com/coreos/bugs/issues/2347 ).
>>> > >
>>> > > Meanwhile, Is there any workaround for this ?
>>> > >
>>> > > -Eshak
>>> > >
>>> > > On Tue, Feb 6, 2018 at 6:02 PM, anderson < ander...@prospeed.net >
>>> wrote:
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > To run live, you need either /dev/mem, /proc/kcore, or the /dev/crash
>>> > > driver.
>>> > > You could try "crash vmlinux /proc/kcore" to see if it's available.
>>> If not,
>>> > > you could try building the /dev/crash driver module. But I don't
>>> know if
>>> > > CoreOS offers a kernel-devel package that you could build the driver
>>> > > against? The driver source comes with the crash source package in the
>>> > > memory_driver subdirectory.
>>> > >
>>> > > Dave
>>> > >
>>> > >
>>> > > Sent from my Verizon, Samsung Galaxy smartphone
>>> > >
>>> > > -------- Original message --------
>>> > > From: Eshak < tmdes...@gmail.com >
>>> > > Date: 2/6/18 8:35 PM (GMT-05:00)
>>> > > To: "Discussion list for crash utility usage, maintenance and
>>> development"
>>> > > <
>>> > > crash-utility@redhat.com >
>>> > > Cc: hfu < h...@vmware.com >
>>> > > Subject: Re: [Crash-utility] linux_banner has garbage
>>> > >
>>> > > Hi Dave,
>>> > >
>>> > > When trying to run crash live, I'm getting an error saying that
>>> /dev/mem is
>>> > > not available.
>>> > > I'm running crash from toolbox in a CoreOS VM. Is crash designed to
>>> run
>>> > > from
>>> > > a container ?
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > [root@gt-Server2-gmt ~]# crash -d8
>>> > > /home/user/vmem_vmss_jan26/usr/lib/debug/usr/lib/modules/4.1
>>> 4.11-coreos/vmlinux
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > crash 7.1.9-3.fc27
>>> > >
>>> > > Copyright (C) 2002-2016 Red Hat, Inc.
>>> > >
>>> > > Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
>>> > >
>>> > > Copyright (C) 1999-2006 Hewlett-Packard Co
>>> > >
>>> > > Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
>>> > >
>>> > > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
>>> > >
>>> > > Copyright (C) 2005, 2011 NEC Corporation
>>> > >
>>> > > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
>>> > >
>>> > > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
>>> > >
>>> > > This program is free software, covered by the GNU General Public
>>> License,
>>> > >
>>> > > and you are welcome to change it and/or distribute copies of it under
>>> > >
>>> > > certain conditions. Enter "help copying" to see the conditions.
>>> > >
>>> > > This program has absolutely no warranty. Enter "help warranty" for
>>> details.
>>> > >
>>> > >
>>> > >
>>> > > get_live_memory_source: /dev/mem
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > crash: /dev/mem: No such file or directory
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > [root@gt-Server2-gmt ~]#
>>> > >
>>> > > Thank you,
>>> > > Eshak
>>> > >
>>> > > On Tue, Feb 6, 2018 at 3:05 PM, Eshak < tmdes...@gmail.com > wrote:
>>> > >
>>> > >
>>> > >
>>> > > Thanks for the info Dave.
>>> > > Unfortunately, I cannot run crash live on the machine because the VM
>>> is in
>>> > > hung state right now. After resetting the VM(by tomorrow), will
>>> check for
>>> > > KASLR and phys_base and try the suggested option.
>>> > >
>>> > > The complete output of crash is below:
>>> > >
>>> > >
>>> > > [root@gt-Server2-gmt user]# crash -d8
>>> > > /home/mfusion/vmem_vmss_jan26/usr/lib/debug/usr/lib/modules/
>>> 4.14.11-coreos/vmlinux
>>> > > /home/mfusion/vmem_vmss_jan26/usr/lib/modules/4.14.11-coreos
>>> /build/System.map
>>> > > /home/mfusion/vmem_vmss_jan26/gt-Server2-gmt-612746ca.vmss
>>> > >
>>> > > crash 7.1.9-3.fc27
>>> > > Copyright (C) 2002-2016 Red Hat, Inc.
>>> > > Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
>>> > > Copyright (C) 1999-2006 Hewlett-Packard Co
>>> > > Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
>>> > > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
>>> > > Copyright (C) 2005, 2011 NEC Corporation
>>> > > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
>>> > > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
>>> > > This program is free software, covered by the GNU General Public
>>> License,
>>> > > and you are welcome to change it and/or distribute copies of it under
>>> > > certain conditions. Enter "help copying" to see the conditions.
>>> > > This program has absolutely no warranty. Enter "help warranty" for
>>> details.
>>> > >
>>> > > crash: diskdump / compressed kdump: dump does not have panic dump
>>> header
>>> > > crash: sadump: read dump device as media format
>>> > > crash: sadump: does not have partition header
>>> > > vmw: Header: id=bed2bed2 version=8 numgroups=95
>>> > > vmw: Checkpoint is 64-bit
>>> > > vmw: Group: Checkpoint offset=0x1dbc size=0x0x3ab.
>>> > > vmw: Group: GuestVars offset=0x2167 size=0x0xa3.
>>> > > vmw: Group: cpuid offset=0x220a size=0x0x5e0e.
>>> > > vmw: Group: cpu offset=0x8018 size=0x0x615bb.
>>> > > vmw: Group: BusMemSample offset=0x695d3 size=0x0x1c.
>>> > > vmw: Group: UUIDVMX offset=0x695ef size=0x0x2e.
>>> > > vmw: Group: StateLogger offset=0x6961d size=0x0x2.
>>> > > vmw: Group: memory offset=0x6961f size=0x0xa8.
>>> > > vmw: Item align_mask[0][0] => position=0x69633 size=0x4: 0000FFFF
>>> > > vmw: Item regionsCount => position=0x69645 size=0x4: 00000002
>>> > > vmw: Item regionPageNum[0] => position=0x6965c size=0x4: 00000000
>>> > > vmw: Item regionPPN[0] => position=0x6966f size=0x4: 00000000
>>> > > vmw: Item regionSize[0] => position=0x69683 size=0x4: 000C0000
>>> > > vmw: Item regionPageNum[1] => position=0x6969a size=0x4: 000C0000
>>> > > vmw: Item regionPPN[1] => position=0x696ad size=0x4: 00100000
>>> > > vmw: Item regionSize[1] => position=0x696c1 size=0x4: 00E40000
>>> > > vmw: Group: MStats offset=0x696c7 size=0x0x1936.
>>> > > vmw: Group: Snapshot offset=0x6affd size=0x0x4b9c.
>>> > > vmw: Group: pic offset=0x6fb99 size=0x0x511.
>>> > > vmw: Group: FTCpt offset=0x700aa size=0x0x2.
>>> > > vmw: Group: ide1:0 offset=0x700ac size=0x0x16e.
>>> > > vmw: Group: scsi0:0 offset=0x7021a size=0x0x46.
>>> > > vmw: Group: Migrate offset=0x70260 size=0x0x2.
>>> > > vmw: Group: TimeTracker offset=0x70262 size=0x0x99.
>>> > > vmw: Group: Backdoor offset=0x702fb size=0x0x2e.
>>> > > vmw: Group: PCI offset=0x70329 size=0x0x13.
>>> > > vmw: Group: Cs440bx offset=0x7033c size=0x0x40539.
>>> > > vmw: Group: ExtCfgDevice offset=0xb0875 size=0x0x30.
>>> > > vmw: Group: Floppy offset=0xb08a5 size=0x0x918c.
>>> > > vmw: Group: AcpiNotify offset=0xb9a31 size=0x0x1b.
>>> > > vmw: Group: vcpuHotPlug offset=0xb9a4c size=0x0xf5.
>>> > > vmw: Group: devHP offset=0xb9b41 size=0x0x86.
>>> > > vmw: Group: ACPIWake offset=0xb9bc7 size=0x0x1b.
>>> > > vmw: Group: DevicesPowerOn offset=0xb9be2 size=0x0x2.
>>> > > vmw: Group: PCIBridge0 offset=0xb9be4 size=0x0x272.
>>> > > vmw: Group: PCIBridge4 offset=0xb9e56 size=0x0x48e.
>>> > > vmw: Group: pciBridge4:1 offset=0xba2e4 size=0x0x48e.
>>> > > vmw: Group: pciBridge4:2 offset=0xba772 size=0x0x48e.
>>> > > vmw: Group: pciBridge4:3 offset=0xbac00 size=0x0x48e.
>>> > > vmw: Group: pciBridge4:4 offset=0xbb08e size=0x0x48e.
>>> > > vmw: Group: pciBridge4:5 offset=0xbb51c size=0x0x48e.
>>> > > vmw: Group: pciBridge4:6 offset=0xbb9aa size=0x0x48e.
>>> > > vmw: Group: pciBridge4:7 offset=0xbbe38 size=0x0x48e.
>>> > > vmw: Group: PCIBridge5 offset=0xbc2c6 size=0x0x48e.
>>> > > vmw: Group: pciBridge5:1 offset=0xbc754 size=0x0x48e.
>>> > > vmw: Group: pciBridge5:2 offset=0xbcbe2 size=0x0x48e.
>>> > > vmw: Group: pciBridge5:3 offset=0xbd070 size=0x0x48e.
>>> > > vmw: Group: pciBridge5:4 offset=0xbd4fe size=0x0x48e.
>>> > > vmw: Group: pciBridge5:5 offset=0xbd98c size=0x0x48e.
>>> > > vmw: Group: pciBridge5:6 offset=0xbde1a size=0x0x48e.
>>> > > vmw: Group: pciBridge5:7 offset=0xbe2a8 size=0x0x48e.
>>> > > vmw: Group: PCIBridge6 offset=0xbe736 size=0x0x48e.
>>> > > vmw: Group: pciBridge6:1 offset=0xbebc4 size=0x0x48e.
>>> > > vmw: Group: pciBridge6:2 offset=0xbf052 size=0x0x48e.
>>> > > vmw: Group: pciBridge6:3 offset=0xbf4e0 size=0x0x48e.
>>> > > vmw: Group: pciBridge6:4 offset=0xbf96e size=0x0x48e.
>>> > > vmw: Group: pciBridge6:5 offset=0xbfdfc size=0x0x48e.
>>> > > vmw: Group: pciBridge6:6 offset=0xc028a size=0x0x48e.
>>> > > vmw: Group: pciBridge6:7 offset=0xc0718 size=0x0x48e.
>>> > > vmw: Group: PCIBridge7 offset=0xc0ba6 size=0x0x48e.
>>> &
>>>
>>
>> --
>> Crash-utility mailing list
>> Crash-utility@redhat.com
>> https://www.redhat.com/mailman/listinfo/crash-utility
>>
>
>
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Reply via email to