Hi Aureau,

On Tue, Apr 16, 2024 at 5:14 PM Aureau, Georges (Kernel Tools ERT)
<georges.aur...@hpe.com> wrote:
>
> Hello Tao Liu,
>
> We don't update machdep->machspec->irq_eframe_link as it is now more a 
> reference point for trying various link offsets (+8, -8, +16, -16) around it.
> This reference point should be viewed as the most likely link offset, and if 
> we fail, we try around it, but without updating this reference point,
> otherwise we lose our reference point and the offsets we would be trying next 
> would have a different meaning.
> Different processors may be caught in different IRQ/SYSVEC functions with 
> different framesizes,
> so there is no longer a unique (same for all IRQ/SYSVEC functions) link 
> offset, so we can't compute one eframe link for all.
> We are adapting the eframe link offset on the fly based on possible 
> IRQ/SYSVEC function framesizes, ie.
> most likely -32, then try -40, then -24, then -48, then -16.
> If we would update as we find a link, let's say -16, then next time around,
> most likely -16, then try -24, then -8, then -32, then 0, eg. we would lose 
> the -40 and -48.
> So if we have a most likely eframe link as a reference point, we should not 
> move this point.
> The comment in x86_64_irq_eframe_link() for the patch I'm suggesting is 
> saying:
> "keep ms->irq_eframe_link as the most likely value, and try a few sizes 
> around it."
> See below:

Thanks for the explanation, it sounds reasonable. OK I have no other
comments for this patch, so ack.

Thanks,
Tao Liu

> --
> diff --git a/x86_64.c b/x86_64.c
> index aec82b0..90c2a91 100644
> --- a/x86_64.c
> +++ b/x86_64.c
> @@ -6623,13 +6623,14 @@ x86_64_irq_eframe_link_init(void)
>
>  /*
>   *  Calculate and verify the IRQ exception frame location from the
> - *  stack reference at the top of the IRQ stack, possibly adjusting
> - *  the ms->irq_eframe_link value.
> + *  stack reference at the top of the IRQ stack, keep ms->irq_eframe_link
> + *  as the most likely value, and try a few sizes around it.
>   */
>  static ulong
>  x86_64_irq_eframe_link(ulong stkref, struct bt_info *bt, FILE *ofp)
>  {
>         ulong irq_eframe;
> +       int i, try[] = { 8, -8, 16, -16 };
>
>         if (x86_64_exception_frame(EFRAME_VERIFY, stkref, 0, bt, ofp))
>                 return stkref;
> @@ -6639,9 +6640,10 @@ x86_64_irq_eframe_link(ulong stkref, struct bt_info 
> *bt, FILE *ofp)
>         if (x86_64_exception_frame(EFRAME_VERIFY, irq_eframe, 0, bt, ofp))
>                 return irq_eframe;
>
> -       if (x86_64_exception_frame(EFRAME_VERIFY, irq_eframe+8, 0, bt, ofp)) {
> -               machdep->machspec->irq_eframe_link -= 8;
> -               return (irq_eframe + 8);
> +       for (i=0; i<sizeof(try)/sizeof(int); i++) {
> +               if (x86_64_exception_frame(EFRAME_VERIFY, irq_eframe+try[i], 
> 0, bt, ofp)) {
> +                       return (irq_eframe + try[i]);
> +               }
>         }
>
>         return irq_eframe;
> --
>
> Cheers,
> Georges
>
>
>
> -----Original Message-----
> From: Tao Liu <l...@redhat.com>
> Sent: Tuesday, April 16, 2024 3:19 AM
> To: Aureau, Georges (Kernel Tools ERT) <georges.aur...@hpe.com>
> Cc: HAGIO KAZUHITO(萩尾 一仁) <k-hagio...@nec.com>; Lianbo Jiang 
> <liji...@redhat.com>; devel@lists.crash-utility.osci.io
> Subject: Re: [Crash-utility] Re: [PATCH] x86_64: rhel 9.3: "crash" doesn't 
> handle all IRQ exception properly
>
> Hi Aureau,
>
> On Mon, Apr 15, 2024 at 11:37 PM Aureau, Georges (Kernel Tools ERT) 
> <georges.aur...@hpe.com> wrote:
> >
> > Hello Tao Liu, and Kazu,
> >
> > Thanks Kazu for your ack.
> >
> > Indeed, Tao Liu, you are right, we could factor the case 0 (most
> > likely case) into the try[] loop but this would make the code not as
> > readable. I would rather keep my original patch, where we leave case 0 as 
> > an explicit call (most likely case) outside of the try[] loop.
>
> OK, thanks for the explanation, I agree with your thoughts. One more question 
> which I still not very clear:
>
> -               machdep->machspec->irq_eframe_link -= 8;
>
> Why don't we update irq_eframe_link after the code change?
>
> Thanks,
> Tao Liu
>
>
> >
> > Thanks,
> > Georges
> >
> > -----Original Message-----
> > From: Tao Liu <l...@redhat.com>
> > Sent: Monday, April 15, 2024 2:33 PM
> > To: Aureau, Georges (Kernel Tools ERT) <georges.aur...@hpe.com>
> > Cc: Lianbo Jiang <liji...@redhat.com>; HAGIO KAZUHITO(萩尾 一仁)
> > <k-hagio...@nec.com>; devel@lists.crash-utility.osci.io
> > Subject: Re: [Crash-utility] Re: [PATCH] x86_64: rhel 9.3: "crash"
> > doesn't handle all IRQ exception properly
> >
> > Hi Aureau,
> >
> > On Mon, Apr 15, 2024 at 4:48 PM HAGIO KAZUHITO(萩尾 一仁)
> > <k-hagio...@nec.com> wrote:
> > >
> > > On 2024/04/10 22:47, Aureau, Georges (Kernel Tools ERT) wrote:
> > > > Hello Lianbo,
> > > >
> > > > In my case, the fix you suggested works properly as I've got a single 
> > > > CPU with an IRQ exception in a SYSVEC handler.
> > > > But it we have multiple CPUs with different IRQ exceptions, then
> > > > your suggestion does not work as
> > > > x86_64_irq_eframe_link() is changing irq_eframe_link, for example:
> > > >
> > > > x86_64_irq_eframe_link_init():
> > > >      set default irq_eframe_link=-24 since we have
> > > > sysvec_apic_timer_interrupt
> > > >
> > > > Then if we do "bt -c cpu" on several cpu X, Y, X, being of their 
> > > > irq_stack:
> > > >
> > > > x86_64_irq_eframe_link()/cpu X/rip:sysvec_apic_timer_interrupt:
> > > >      default irq_eframe_link=-24 is matching 
> > > > -framesize(rip:sysvec_apic_timer_interrupt)
> > > >      so we are ok
> > > >
> > > > x86_64_irq_eframe_link()/cpu Y/rip:common_interrupt:
> > > >      -24 does not work as framesize(common_interrupt)=32
> > > >      we try -24-8 = -32, it works, and we set irq_eframe_link=-32
> > > >
> > > > x86_64_irq_eframe_link()/cpu Z/rip:sysvec_apic_timer_interrupt:
> > > >      -32 does not work as framesize(rip:sysvec_apic_timer_interrupt)=24
> > > >      we try -32-8 = -40, it does not work, and we are back to square 
> > > > one !
> > > >      here we should try  -32+8 = -24
> > > >
> > > > We would need to have "bt -c cpu" doing runtime per cpu selection 
> > > > between -32 and -24 depending on 
> > > > framesize(*(pcpu_hot.hardirq_stack_ptr-8)).
> > > > So we can't really see "irq_eframe_link" as being a value valid for all 
> > > > cpu's / IRQ.
> > > > The "irq_eframe_link" should be per cpu, depending of what each cpu is 
> > > > executing (more below).
> > > >
> > > > Currently x86_64_irq_eframe_link() tries "stkref", then it tries
> > > > "stkref-irq_eframe_link" (with irq_eframe_link set in 
> > > > irq_eframe_link_init()), and then it tries "stkref-irq_eframe_link+8":
> > > >
> > > > static ulong
> > > > x86_64_irq_eframe_link(ulong stkref, struct bt_info *bt, FILE
> > > > *ofp) {
> > > >          ulong irq_eframe;
> > > >
> > > >          if (x86_64_exception_frame(EFRAME_VERIFY, stkref, 0, bt, ofp))
> > > >                  return stkref;
> > > >
> > > >          irq_eframe = stkref - machdep->machspec->irq_eframe_link;
> > > >
> > > >          if (x86_64_exception_frame(EFRAME_VERIFY, irq_eframe, 0, bt, 
> > > > ofp))
> > > >                  return irq_eframe;
> > > >
> > > >          if (x86_64_exception_frame(EFRAME_VERIFY, irq_eframe+8, 0, bt, 
> > > > ofp)) {
> > > >                  machdep->machspec->irq_eframe_link -= 8;
> > > >                  return (irq_eframe + 8);
> > > >          }
> > > >
> > > >          return irq_eframe;
> > > > }
> > > >
> > > >
> > > > I don't have the history as why we try "stkref-irq_eframe_link+8" ? Any 
> > > > idea ?
> > > >
> > > > But this current design with defaulting irq_eframe_link to -32 (or -24 
> > > > as per your suggestion) does not always work.
> > > >
> > > > We would need to handle both irq_stack rip's from IRQ functions  (eg. 
> > > > common_interrupt):
> > > >
> > > > #define DEFINE_IDTENTRY_IRQ(func)                                       
> > > > \
> > > > static void __##func(struct pt_regs *regs, u32 vector);                 
> > > > \
> > > >                                                                         
> > > >  \
> > > > __visible noinstr void func(struct pt_regs *regs,                       
> > > > \
> > > >                              unsigned long error_code)                  
> > > >  \
> > > > {                                                                       
> > > > \
> > > >          irqentry_state_t state = irqentry_enter(regs);                 
> > > >  \
> > > >          u32 vector = (u32)(u8)error_code;                              
> > > >  \
> > > >                                                                         
> > > >  \
> > > >          instrumentation_begin();                                       
> > > >  \
> > > >          kvm_set_cpu_l1tf_flush_l1d();                                  
> > > >  \
> > > >          run_irq_on_irqstack_cond(__##func, regs, vector);              
> > > >  \
> > > >          instrumentation_end();                                         
> > > >  \
> > > >          irqentry_exit(regs, state);                                    
> > > >  \
> > > > }                                                                       
> > > > \
> > > >
> > > > \ static noinline void __##func(struct pt_regs *regs, u32 vector)
> > > >
> > > >
> > > > And also to handle irq_stack rip's from SYSVEC functions  (eg. 
> > > > sysvec_apic_timer_interrupt):
> > > >
> > > > #define DEFINE_IDTENTRY_SYSVEC(func)                                    
> > > > \
> > > > static void __##func(struct pt_regs *regs);                             
> > > > \
> > > >                                                                         
> > > >  \
> > > > __visible noinstr void func(struct pt_regs *regs)                       
> > > > \
> > > > {                                                                       
> > > > \
> > > >          irqentry_state_t state = irqentry_enter(regs);                 
> > > >  \
> > > >                                                                         
> > > >  \
> > > >          instrumentation_begin();                                       
> > > >  \
> > > >          kvm_set_cpu_l1tf_flush_l1d();                                  
> > > >  \
> > > >          run_sysvec_on_irqstack_cond(__##func, regs);                   
> > > >  \
> > > >          instrumentation_end();                                         
> > > >  \
> > > >          irqentry_exit(regs, state);                                    
> > > >  \
> > > > }                                                                       
> > > > \
> > > >
> > > > \ static noinline void __##func(struct pt_regs *regs) Other changes are 
> > > > required.
> > > >
> > > >
> > > > Clearly we should be adjusting "irq_eframe_link" based on framesize for 
> > > > either DEFINE_IDTENTRY_IRQ() functions or DEFINE_IDTENTRY_SYSVEC() 
> > > > functions.
> > > > Maybe we should have  "irq_eframe_link" for DEFINE_IDTENTRY_IRQ() 
> > > > functions, and another "sysvec_eframe_link" for 
> > > > DEFINE_IDTENTRY_SYSVEC() functions, then try both links.
> > > > Anyhow, the current x86_64_irq_eframe_link_init() code is making 
> > > > assumption on a single common framesize , and actually there is a 
> > > > couple of assumptions:
> > > > - assuming framesizes with hardcoded values
> > > > - assuming a single common framesize
> > > >
> > > > To keep backward compatibility with the current code, which is
> > > > trying +8 (not sure why), the patch I proposed was keeping the existing 
> > > > +8, but it is also trying -8, and also +16 and -16, as to try having 
> > > > more fallbacks in case we didn't guess the proper framesizes.
> > >
> > > Thank you for the fix, I also have a vmcore printing "bogus
> > > exception frame" but there has been no good idea to fix.  The patch
> > > fixed this so
> > >
> > > Acked-by: Kazuhito Hagio <k-hagio...@nec.com>
> > >
> > > $ diff -u before.log after.log
> > >
> > > @@ -577,14 +577,21 @@
> > >    #9 [ffff954a4683cfe0] __irq_exit_rcu at ffffffff86b0948f
> > >   #10 [ffff954a4683cff0] sysvec_apic_timer_interrupt at ffffffff875fe8f2
> > >   --- <IRQ stack> ---
> > > -    RIP: 0000000000000010  RSP: 0000000000000018  RFLAGS: 
> > > ffff954a4437fe80
> > > -    RAX: 000000000000001f  RBX: 000000000000a3e7  RCX: 000000000000000e
> > > -    RDX: 000000003a2e8f1a  RSI: 0000000000000000  RDI: ffffffffffffffff
> > > -    RBP: ffffb53a3f580000   R8: ffff8a51bf5b0a80   R9: 0000001a95a9b7b8
> > > -    R10: 0000000000000000  R11: 00000000000022ef  R12: 0000000000000004
> > > -    R13: ffffffff884acd60  R14: 0000001a95a9b7b8  R15: 0000000000000004
> > > -    ORIG_RAX: ffffffff872fe528  CS: 0202  SS: 1a92a013c6
> > > -WARNING: possibly bogus exception frame
> > > +#11 [ffff954a4437fdd8] asm_sysvec_apic_timer_interrupt at 
> > > ffffffff87800d86
> > > +    [exception RIP: cpuidle_enter_state+216]
> > > +    RIP: ffffffff872fe528  RSP: ffff954a4437fe80  RFLAGS: 00000202
> > > +    RAX: ffff8a51bf5b0a80  RBX: ffffb53a3f580000  RCX: 000000000000001f
> > > +    RDX: 000000000000000e  RSI: 000000003a2e8f1a  RDI: 0000000000000000
> > > +    RBP: 0000000000000004   R8: 0000001a95a9b7b8   R9: 0000000000000000
> > > +    R10: 00000000000022ef  R11: 000000000000a3e7  R12: ffffffff884acd60
> > > +    R13: 0000001a95a9b7b8  R14: 0000000000000004  R15: 0000000000000000
> > > +    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> > > +#12 [ffff954a4437feb8] cpuidle_enter at ffffffff872fe8b9
> > > +#13 [ffff954a4437fed8] cpuidle_idle_call at ffffffff86b624dc
> > > +#14 [ffff954a4437ff10] do_idle at ffffffff86b625fb
> > > +#15 [ffff954a4437ff28] cpu_startup_entry at ffffffff86b62849
> > > +#16 [ffff954a4437ff38] start_secondary at ffffffff86a62a0f
> > > +#17 [ffff954a4437ff50] secondary_startup_64_no_verify at
> > > +ffffffff86a0015a
> > >
> > > Thanks,
> > > Kazu
> > >
> > >
> > > >
> > > > Regards,
> > > > Georges
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Lianbo Jiang <liji...@redhat.com>
> > > > Sent: Wednesday, April 10, 2024 6:11 AM
> > > > To: devel@lists.crash-utility.osci.io
> > > > Cc: Aureau, Georges (Kernel Tools ERT) <georges.aur...@hpe.com>
> > > > Subject: Re: [PATCH] x86_64: rhel 9.3: "crash" doesn't handle all
> > > > IRQ exception properly
> > > >
> > > > Hi, Aureau
> > > >
> > > > Thank you for the fix.
> > > >
> > > > On 4/5/24 08:38, devel-requ...@lists.crash-utility.osci.io wrote:
> > > >> Date: Thu, 4 Apr 2024 14:39:06 +0000
> > > >> From: "Aureau, Georges (Kernel Tools
> > > >> ERT)"<georges.aur...@hpe.com>
> > > >> Subject: [Crash-utility] [PATCH] x86_64: rhel 9.3: "crash" doesn't
> > > >>      handle all IRQ exception properly.
> > > >> To:"devel@lists.crash-utility.osci.io"
> > > >>      <devel@lists.crash-utility.osci.io>
> > > >> Message-ID: <sj0pr84mb14821a83d9631672b4f0f9449f...@sj0pr84mb1482.namp
> > > >>      RD84.PROD.OUTLOOK.COM>
> > > >> Content-Type: text/plain; charset="us-ascii"
> > > >>
> > > >>
> > > >> Hello,
> > > >>
> > > >> On x86_64, with rhel 9.3, there are cases where "crash" can't handle 
> > > >> IRQ exception frames properly.
> > > >>
> > > >> For example, "crash" fails with "WARNING: possibly bogus exception 
> > > >> frame":
> > > >>
> > > >> crash> bt -c 30
> > > >> PID: 2898241  TASK: ff4cb0ce0da0c680  CPU: 30   COMMAND: "star-ccm+"
> > > >>    #0 [fffffe4658d88e58] crash_nmi_callback at ffffffffa00675e8
> > > >>    #1 [fffffe4658d88e68] nmi_handle at ffffffffa002ebab
> > > >>    #2 [fffffe4658d88eb0] default_do_nmi at ffffffffa0c56cd0
> > > >>    #3 [fffffe4658d88ed0] exc_nmi at ffffffffa0c56ee1
> > > >>    #4 [fffffe4658d88ef0] end_repeat_nmi at ffffffffa0e01639
> > > >>       [exception RIP: native_queued_spin_lock_slowpath+636]
> > > >>       RIP: ffffffffa0c6b80c  RSP: ff5eba269937ce10  RFLAGS: 00000046
> > > >>       RAX: 0000000000000000  RBX: ff4cb12bcfbb2940  RCX: 
> > > >> 00000000007c0000
> > > >>       RDX: 0000000000000057  RSI: 0000000001600000  RDI: 
> > > >> ff4cb12d67205608
> > > >>       RBP: ff4cb12d67205608   R8: ff90b9c682475940   R9: 
> > > >> 0000000000000000
> > > >>       R10: ff4cb0d70ee4e100  R11: 0000000000000000  R12: 
> > > >> 0000000000000000
> > > >>       R13: 000000000000001e  R14: ff90b9c682474918  R15: 
> > > >> ff90b9c682477120
> > > >>       ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> > > >> --- <NMI exception stack> ---
> > > >>    #5 [ff5eba269937ce10] native_queued_spin_lock_slowpath at 
> > > >> ffffffffa0c6b80c
> > > >>    #6 [ff5eba269937ce30] _raw_spin_lock_irqsave at ffffffffa0c6ad90
> > > >>    #7 [ff5eba269937ce40] free_iova at ffffffffa07df716
> > > >>    #8 [ff5eba269937ce68] fq_ring_free at ffffffffa07dba15
> > > >>    #9 [ff5eba269937ce98] fq_flush_timeout at ffffffffa07dc8fe
> > > >> #10 [ff5eba269937ced0] call_timer_fn at ffffffffa01c00a4
> > > >> #11 [ff5eba269937cef0] __run_timers at ffffffffa01c03ae
> > > >> #12 [ff5eba269937cf88] run_timer_softirq at ffffffffa01c0476
> > > >> #13 [ff5eba269937cf90] __do_softirq at ffffffffa0c6c007
> > > >> #14 [ff5eba269937cfe0] __irq_exit_rcu at ffffffffa010ef61
> > > >> #15 [ff5eba269937cff0] sysvec_apic_timer_interrupt at
> > > >> ffffffffa0c58ca2
> > > >> --- <IRQ stack> ---
> > > >>       RIP: 0000000000000010  RSP: 0000000000000018  RFLAGS: 
> > > >> ff5eba26ddc9f7e8
> > > >>       RAX: 0000000000000a20  RBX: ff5eba26ddc9f940  RCX: 
> > > >> 0000000000001000
> > > >>       RDX: ffffffb559980000  RSI: ff4cb12d67207400  RDI: 
> > > >> ffffffffffffffff
> > > >>       RBP: 0000000000001000   R8: ff5eba26ddc9f940   R9: 
> > > >> ff5eba26ddc9f8af
> > > >>       R10: 0000000000000003  R11: 0000000000000a20  R12: 
> > > >> ff5eba26ddc9f8b0
> > > >>       R13: 000000283c07f000  R14: ff4cb0f5a29a1c00  R15: 
> > > >> 0000000000000001
> > > >>       ORIG_RAX: ffffffffa07c4e60  CS: 0206  SS: 7000001cf0380001
> > > >> bt: WARNING: possibly bogus exception frame
> > > >>
> > > >>
> > > >> Running "crash" with "--machdep irq_eframe_link=0xffffffffffffffe8" 
> > > >> (ie. thus irq_eframe_link = -24) works properly:
> > > >>
> > > >> PID: 2898241  TASK: ff4cb0ce0da0c680  CPU: 30   COMMAND: "star-ccm+"
> > > >>    #0 [fffffe4658d88e58] crash_nmi_callback at ffffffffa00675e8
> > > >>    #1 [fffffe4658d88e68] nmi_handle at ffffffffa002ebab
> > > >>    #2 [fffffe4658d88eb0] default_do_nmi at ffffffffa0c56cd0
> > > >>    #3 [fffffe4658d88ed0] exc_nmi at ffffffffa0c56ee1
> > > >>    #4 [fffffe4658d88ef0] end_repeat_nmi at ffffffffa0e01639
> > > >>       [exception RIP: native_queued_spin_lock_slowpath+636]
> > > >>       RIP: ffffffffa0c6b80c  RSP: ff5eba269937ce10  RFLAGS: 00000046
> > > >>       RAX: 0000000000000000  RBX: ff4cb12bcfbb2940  RCX: 
> > > >> 00000000007c0000
> > > >>       RDX: 0000000000000057  RSI: 0000000001600000  RDI: 
> > > >> ff4cb12d67205608
> > > >>       RBP: ff4cb12d67205608   R8: ff90b9c682475940   R9: 
> > > >> 0000000000000000
> > > >>       R10: ff4cb0d70ee4e100  R11: 0000000000000000  R12: 
> > > >> 0000000000000000
> > > >>       R13: 000000000000001e  R14: ff90b9c682474918  R15: 
> > > >> ff90b9c682477120
> > > >>       ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> > > >> --- <NMI exception stack> ---
> > > >>    #5 [ff5eba269937ce10] native_queued_spin_lock_slowpath at 
> > > >> ffffffffa0c6b80c
> > > >>    #6 [ff5eba269937ce30] _raw_spin_lock_irqsave at ffffffffa0c6ad90
> > > >>    #7 [ff5eba269937ce40] free_iova at ffffffffa07df716
> > > >>    #8 [ff5eba269937ce68] fq_ring_free at ffffffffa07dba15
> > > >>    #9 [ff5eba269937ce98] fq_flush_timeout at ffffffffa07dc8fe
> > > >> #10 [ff5eba269937ced0] call_timer_fn at ffffffffa01c00a4
> > > >> #11 [ff5eba269937cef0] __run_timers at ffffffffa01c03ae
> > > >> #12 [ff5eba269937cf88] run_timer_softirq at ffffffffa01c0476
> > > >> #13 [ff5eba269937cf90] __do_softirq at ffffffffa0c6c007
> > > >> #14 [ff5eba269937cfe0] __irq_exit_rcu at ffffffffa010ef61
> > > >> #15 [ff5eba269937cff0] sysvec_apic_timer_interrupt at
> > > >> ffffffffa0c58ca2
> > > >> --- <IRQ stack> ---
> > > >> #16 [ff5eba26ddc9f738] asm_sysvec_apic_timer_interrupt at 
> > > >> ffffffffa0e00e06
> > > >>       [exception RIP: alloc_pte.constprop.0+32]
> > > >>       RIP: ffffffffa07c4e60  RSP: ff5eba26ddc9f7e8  RFLAGS: 00000206
> > > >>       RAX: ff5eba26ddc9f940  RBX: 0000000000001000  RCX: 
> > > >> 0000000000000a20
> > > >>       RDX: 0000000000001000  RSI: ffffffb559980000  RDI: 
> > > >> ff4cb12d67207400
> > > >>       RBP: ff5eba26ddc9f8b0   R8: ff5eba26ddc9f8af   R9: 
> > > >> 0000000000000003
> > > >>       R10: 0000000000000a20  R11: ff5eba26ddc9f940  R12: 
> > > >> 000000283c07f000
> > > >>       R13: ff4cb0f5a29a1c00  R14: 0000000000000001  R15: 
> > > >> ff4cb0f5a29a1bf8
> > > >>       ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> > > >> #17 [ff5eba26ddc9f830] iommu_v1_map_pages at ffffffffa07c5648
> > > >> #18 [ff5eba26ddc9f8f8] __iommu_map at ffffffffa07d7803
> > > >> #19 [ff5eba26ddc9f990] iommu_map_sg at ffffffffa07d7b71
> > > >> #20 [ff5eba26ddc9f9f0] iommu_dma_map_sg at ffffffffa07ddcc9
> > > >> #21 [ff5eba26ddc9fa90] __dma_map_sg_attrs at ffffffffa01b5205
> > > >> #22 [ff5eba26ddc9fa98] dma_map_sgtable at ffffffffa01b526d
> > > >> #23 [ff5eba26ddc9faa8] ib_umem_get at ffffffffc112f308
> > > >> [ib_uverbs]
> > > >> #24 [ff5eba26ddc9fb00] mlx5_ib_reg_user_mr at ffffffffc11eb991
> > > >> [mlx5_ib]
> > > >> #25 [ff5eba26ddc9fb40] ib_uverbs_reg_mr at ffffffffc1123bc4
> > > >> [ib_uverbs]
> > > >> #26 [ff5eba26ddc9fbb0]
> > > >> ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE
> > > >> at
> > > >> ffffffffc112d064 [ib_uverbs]
> > > >> #27 [ff5eba26ddc9fbe0] ib_uverbs_run_method at ffffffffc112a121
> > > >> [ib_uverbs]
> > > >> #28 [ff5eba26ddc9fc30] ib_uverbs_cmd_verbs at ffffffffc112a332
> > > >> [ib_uverbs]
> > > >> #29 [ff5eba26ddc9fe68] ib_uverbs_ioctl at ffffffffc112a494
> > > >> [ib_uverbs]
> > > >> #30 [ff5eba26ddc9fea8] __x64_sys_ioctl at ffffffffa0438f67
> > > >> #31 [ff5eba26ddc9fed8] do_syscall_64 at ffffffffa0c55189
> > > >> #32 [ff5eba26ddc9ff50] entry_SYSCALL_64_after_hwframe at 
> > > >> ffffffffa0e000ea
> > > >>       RIP: 0000146ef3a3ec6b  RSP: 00007ffe3b7dc5b8  RFLAGS: 00000246
> > > >>       RAX: ffffffffffffffda  RBX: 00007ffe3b7dc6a8  RCX: 
> > > >> 0000146ef3a3ec6b
> > > >>       RDX: 00007ffe3b7dc690  RSI: 00000000c0181b01  RDI: 
> > > >> 0000000000000011
> > > >>       RBP: 00007ffe3b7dc670   R8: 0000146ed9549010   R9: 
> > > >> 00007ffe3b7dc7c4
> > > >>       R10: 0000000000000009  R11: 0000000000000246  R12: 
> > > >> 00007ffe3b7dc7c4
> > > >>       R13: 000000000000000c  R14: 000000000000000c  R15: 
> > > >> 0000146ed9549150
> > > >>       ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b
> > > >>
> > > >>
> > > >> Some background:
> > > >>
> > > >> asm_common_interrupt:
> > > >>         callq  error_entry
> > > >>         movq   %rax,%rsp
> > > >>         movq   %rsp,%rdi
> > > >>         movq   0x78(%rsp),%rsi
> > > >>         movq   $-0x1,0x78(%rsp)
> > > >>         call common_interrupt    # rsp pointing to regs
> > > >>
> > > >> common_interrupt
> > > >>         pushq  %r12
> > > >>         pushq  %rbp
> > > >>         pushq  %rbx
> > > >>         [...]
> > > >>         movq   hardirq_stack_ptr,%r11
> > > >>         movq   %rsp,(%r11)
> > > >>         movq   %r11,%rsp
> > > >>         [...]
> > > >>         call __common_interrupt  # rip:common_interrupt
> > > >>
> > > >> So frame_size(rip:common_interrupt) = 32 (3 push + ret).
> > > >>
> > > >> Hence "machdep->machspec->irq_eframe_link = -32;"  (see 
> > > >> x86_64_irq_eframe_link_init()).
> > > >>
> > > >> Now:
> > > >>
> > > >> asm_sysvec_apic_timer_interrupt:
> > > >>         pushq  $-0x1
> > > >>         callq  error_entry
> > > >>         movq   %rax,%rsp
> > > >>         movq   %rsp,%rdi
> > > >>         callq  sysvec_apic_timer_interrupt
> > > >>
> > > >> sysvec_apic_timer_interrupt
> > > >>         pushq  %r12
> > > >>         pushq  %rbp
> > > >>         [...]
> > > >>         movq   hardirq_stack_ptr,%r11
> > > >>         movq   %rsp,(%r11)
> > > >>         movq   %r11,%rsp
> > > >>         [...]
> > > >>         call __sysvec_apic_timer_interrupt  #
> > > >> rip:sysvec_apic_timer_interrupt
> > > >>
> > > >> Here frame_size(rip:sysvec_apic_timer_interrupt) = 24 (2 push +
> > > >> ret)
> > > >>
> > > >> We should also notice that:
> > > >>
> > > >> rip  = *(hardirq_stack_ptr - 8)
> > > >> rsp  = *(hardirq_stack_ptr)
> > > >> regs = rsp - frame_size(rip)
> > > >>
> > > >> But x86_64_get_framesize() does not work with IRQ handlers (returns 0).
> > > >> So not many options other than hardcoding the most likely value and 
> > > >> looking around it.
> > > >> Actually x86_64_irq_eframe_link() was trying -32 (default), and then 
> > > >> -40,  but not -24 !
> > > >>
> > > >> Signed-off-by: Georges Aureau<georges.aur...@hpe.com>
> > > >> --
> > > >> diff --git a/x86_64.c b/x86_64.c
> > > >> index aec82b0..90c2a91 100644
> > > >> --- a/x86_64.c
> > > >> +++ b/x86_64.c
> > > >> @@ -6623,13 +6623,14 @@ x86_64_irq_eframe_link_init(void)
> > > >>
> > > >>    /*
> > > >>     *  Calculate and verify the IRQ exception frame location from
> > > >> the
> > > >> - *  stack reference at the top of the IRQ stack, possibly
> > > >> adjusting
> > > >> - *  the ms->irq_eframe_link value.
> > > >> + *  stack reference at the top of the IRQ stack, keep
> > > >> + ms->irq_eframe_link
> > > >> + *  as the most likely value, and try a few sizes around it.
> > > >>     */
> > > >>    static ulong
> > > >>    x86_64_irq_eframe_link(ulong stkref, struct bt_info *bt, FILE *ofp)
> > > >>    {
> > > >>           ulong irq_eframe;
> > > >> +       int i, try[] = { 8, -8, 16, -16 };
> > > >>
> > > >>           if (x86_64_exception_frame(EFRAME_VERIFY, stkref, 0, bt, 
> > > >> ofp))
> > > >>                   return stkref;
> > > >> @@ -6639,9 +6640,10 @@ x86_64_irq_eframe_link(ulong stkref, struct 
> > > >> bt_info *bt, FILE *ofp)
> > > >>           if (x86_64_exception_frame(EFRAME_VERIFY, irq_eframe, 0, bt, 
> > > >> ofp))
> > > >>                   return irq_eframe;
> > > >>
> > > >> -       if (x86_64_exception_frame(EFRAME_VERIFY, irq_eframe+8, 0, bt, 
> > > >> ofp)) {
> > > >> -               machdep->machspec->irq_eframe_link -= 8;
> >
> > Why don't we update "machdep->machspec->irq_eframe_link" in the following 
> > added code?
> >
> > > >> -               return (irq_eframe + 8);
> > > >> +       for (i=0; i<sizeof(try)/sizeof(int); i++) {
> > > >> +               if (x86_64_exception_frame(EFRAME_VERIFY, 
> > > >> irq_eframe+try[i], 0, bt, ofp)) {
> > > >> +                       return (irq_eframe + try[i]);
> > > >> +               }
> > > >>           }
> > > >>
> > > >>           return irq_eframe;
> >
> > I guess if we don't need to update
> > "machdep->machspec->irq_eframe_link", maybe we can merge the code as 
> > follows?
> >
> >  x86_64_irq_eframe_link(ulong stkref, struct bt_info *bt, FILE *ofp)  {
> >         ulong irq_eframe;
> > +       int i, try[] = { 0, 8, -8, 16, -16 };
> >
> >         if (x86_64_exception_frame(EFRAME_VERIFY, stkref, 0, bt, ofp))
> >                 return stkref;
> >
> >         irq_eframe = stkref - machdep->machspec->irq_eframe_link;
> >
> > -       if (x86_64_exception_frame(EFRAME_VERIFY, irq_eframe, 0, bt, ofp))
> > -               return irq_eframe;
> > -
> > -       if (x86_64_exception_frame(EFRAME_VERIFY, irq_eframe+8, 0, bt, 
> > ofp)) {
> > -               machdep->machspec->irq_eframe_link -= 8;
> > -               return (irq_eframe + 8);
> > +       for (i=0; i<sizeof(try)/sizeof(int); i++) {
> > +               if (x86_64_exception_frame(EFRAME_VERIFY,
> > irq_eframe+try[i], 0, bt, ofp)) {
> > +                       return (irq_eframe + try[i]);
> > +               }
> >         }
> >
> >         return irq_eframe;
> >
> > Thanks,
> > Tao Liu
> >
> > > >
> > > >
> > > > Can you help to try the following changes? It could be good to check 
> > > > the relevant symbol, but I'm not sure if this can work well for you and 
> > > > cover some corner cases.
> > > >
> > > > diff --git a/x86_64.c b/x86_64.c
> > > > index aec82b03b62e..251e224c013b 100644
> > > > --- a/x86_64.c
> > > > +++ b/x86_64.c
> > > > @@ -6574,6 +6574,8 @@ x86_64_irq_eframe_link_init(void)
> > > >        if (symbol_exists("asm_common_interrupt")) {
> > > >            if (symbol_exists("asm_call_on_stack"))
> > > >                machdep->machspec->irq_eframe_link = -64;
> > > > +        else if (symbol_exists("sysvec_apic_timer_interrupt"))
> > > > +            machdep->machspec->irq_eframe_link = -24;
> > > >            else
> > > >                machdep->machspec->irq_eframe_link = -32;
> > > >            return;
> > > >
> > > >
> > > > Thanks
> > > >
> > > > Lianbo
> > > >
> > > > --
> > > > Crash-utility mailing list -- devel@lists.crash-utility.osci.io To
> > > > unsubscribe send an email to
> > > > devel-le...@lists.crash-utility.osci.io
> > > > https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.
> > > > io
> > > > / Contribution Guidelines:
> > > > https://github.com/crash-utility/crash/wiki
> > > --
> > > Crash-utility mailing list -- devel@lists.crash-utility.osci.io To
> > > unsubscribe send an email to devel-le...@lists.crash-utility.osci.io
> > > https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io
> > > / Contribution Guidelines:
> > > https://github.com/crash-utility/crash/wiki
> >
>
--
Crash-utility mailing list -- devel@lists.crash-utility.osci.io
To unsubscribe send an email to devel-le...@lists.crash-utility.osci.io
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki

Reply via email to