----- Original Message -----

> > > 
> > > With the zgetdump tool we create live dumps from /dev/mem or /dev/crash.
> > > These dumps get the LIVE_DUMP flag indicating that data is not
> > > consistent.
> > > 
> > > Besides of this, we have two other non-disruptive live dump features:
> > > 
> > >   - VMDUMP for z/VM guests
> > >   - Virsh dump for KVM guests
> > > 
> > > In contrast to the zgetdump method here the guest system is stopped
> > > to get consistent snapshots. Therefore I think it is fine to *not* set
> > > the LIVE_DUMP flag.
> > > 
> > > Besides of those live dump mechanisms (and kdump) we have our stand-alone 
> > > dump
> > > tools for DASD and SCSI. Also these dump methods are "Linux independent" 
> > > and
> > > therefore can produce dumps without panic tasks.
> > > 
> > > You can read more on s390 dump in the documents below:
> > > 
> > >  * http://www.vm.ibm.com/education/lvc/LVC1219.pdf
> > >  * 
> > > http://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaaf/lnz_r_dt.html?cp=linuxonibm%2F0-4-0-1
> > > 
> > > Michael
> > 
> > OK, so from what I understand, there still can be s390x dumpfiles which 
> > have no indication
> > of the panic task or cpu (if there is one) in their headers, and therefore 
> > may try the "bt -r"
> > type search of the active tasks via raw_stack_dump() in 
> > get_active_set_panic_task(),
> > and if that fails, fall back to the "bt -t" search of all tasks in 
> > panic_search().
> > 
> > In those cases, I suppose you could:
> > 
> >  (1) restrict the raw_stack_dump() parameters in
> >  get_active_set_panic_task() to exclude
> >      the user register dump at the top of the stack, and
> >  (2) plug in a MACHDEP_BT_TEXT handler for the s390x instead of using the 
> > generic version,
> >      and in that case, could prevent the search from entering the 
> > user-space register dump
> >      at the top of the stack, or
> > (2a) replace "bt -t" with just "bt" in panic_search() for s390x as you did 
> > in the original
> >      patch.
> > 
> > But (1) and (2) are not fool-proof, because even the kernel-only part of 
> > the stack could
> > simply contain "numbers" that by dumb luck fall into the zero-based virtual 
> > address
> > range of panic, crash_kexec, etc., and return a false positive.  So I don't 
> > know
> > how that can be made absolutely reliable.
> 
> I still would prefer 2a. See patch below.

OK, that's fine with me.

> 
> > 
> > But at least with dumpfiles that have the live dump magic number (and I'm 
> > still
> > not clear which of the 4 types do so),
> 
> Only the zgetdump live dump gets the live dump magic number.

OK, thanks for the clarification -- I'll update the changelog to indicate that.

Queued for crash-7.1.3:

  
https://github.com/crash-utility/crash/commit/3c2fc5f2a027fe192327101cdc6db0e24a4794d9

Thanks,
  Dave




> > the simple LIVE_PATCH-check patch covers
> > them.  I'm not sure whether it's worth doing anything beyond that.
> ---
> crash: Do not use bt -t flag in panic_search()
> 
> On s390 we got a dump where a process "gmain" was incorrectly marked as
> running panic task:
> 
> crash> ps | grep gmain
> >   217      1   5      8bec23420     IN   0.0  463276  18240  gmain
> 
> The reason was that the "brute force" way parsing the "bt -t -o"
> output in panic_search() found the symbol "panic" on the stack:
> 
> crash> bt -t -o 8bec23420
> PID: 217    TASK: 8bec23420         CPU: 5   COMMAND: "gmain"
>               START: __schedule at 83f650
>   [       8b662b900] (null) at 0
>   [       8b662b978] __schedule at 83f650
> ...
>   [       8b662bb18] (null) at 0
>   [       8b662bb40] panic at 83679a  <<<<<--------------
> 
> The real stack trace was as follows:
> 
> crash> bt  8bec23420
> Detaching after fork from child process 15508.
> PID: 217    TASK: 8bec23420         CPU: 5   COMMAND: "gmain"
>  #0 [8b662b8f0] __schedule at 83f650
>  #1 [8b662b958] schedule at 83fade
>  #2 [8b662b970] schedule_hrtimeout_range_clock at 842fc8
>  #3 [8b662ba10] poll_schedule_timeout at 2c6e8a
>  #4 [8b662ba30] do_sys_poll at 2c8604
>  #5 [8b662be40] sys_poll at 2c8852
>  #6 [8b662bea8] system_call at 843a66
> 
> The value 0x83679a (panic at 83679a) was a local variable on the stack
> and was interpreted incorrectly as function call to "panic".
> 
> Especially for s390 there are dump methods, e.g. VMDUMP or stand-alone dump,
> where the "bt -t -o" method will be used to find the panic task. Therefore
> and because the "-t" method is quite risky, we use the "normal" stack
> backtrace without the "-t" bt option for s390.
> 
> Signed-off-by: Michael Holzheu <[email protected]>
> ---
>  task.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> --- a/task.c
> +++ b/task.c
> @@ -6633,7 +6633,11 @@ panic_search(void)
>          fd = &foreach_data;
>       fd->keys = 1;
>       fd->keyword_array[0] = FOREACH_BT;
> +#ifdef S390X
> +     fd->flags |= FOREACH_o_FLAG;
> +#else
>       fd->flags |= (FOREACH_t_FLAG|FOREACH_o_FLAG);
> +#endif
>  
>       dietask = lasttask = NO_TASK;
>       
> 

--
Crash-utility mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/crash-utility

Reply via email to