Re: Unexpected exception with apic timer as PWM

e . guidieri Thu, 23 Mar 2017 10:26:50 -0700

Il giorno giovedì 23 marzo 2017 16:58:44 UTC+1, J. Kiszka ha scritto:
> On 2017-03-23 12:50, [email protected] wrote:
> > Il giorno sabato 18 marzo 2017 09:32:19 UTC+1, Jan Kiszka ha scritto:
> >> On 2017-03-17 14:42, [email protected] wrote:
> >>> Il giorno venerdì 17 marzo 2017 13:43:32 UTC+1, J. Kiszka ha scritto:
> >>>> On 2017-03-17 13:06, Claudio Scordino wrote:
> >>>>> Dear all,
> >>>>>
> >>>>> we are facing an unexpected exception when running the apic timer to
> >>>>> drive a GPIO as a software PWM.
> >>>>>
> >>>>> The platform is x86. The software runs in a bare-metal cell. The PWM
> >>>>> frequency is 5 KHz.
> >>>>>
> >>>>> When the duty cycle is very high or very low (i.e., two subsequent
> >>>>> interrupts get closer) we face the following unexpected exception:
> >>>>>
> >>>>> FATAL: Unhandled VM-Exit, reason 2
> >>>>> qualification 0
> >>>>> vectoring info: 0 interrupt info: 0
> >>>>> RIP: 0x00000000000f15d6 RSP: 0x00000000000dff08 FLAGS: 10002
> >>>>
> >>>> "objdump -dS inmate-linked.o" can tell you which instruction at RIP
> >>>> causes this fault. It's a triple fault, likely started off by a general
> >>>> protection or page fault.
> >>>>
> >>>> Jan
> >>>
> >>> Hi I'm Errico, Claudio's coworker, and I'm actually playing with this 
> >>> issue.
> >>>
> >>> The fault happens when we re-arm the apic timer
> >>>
> >>> 00000000000f15b5 <apic_timer_set>:
> >>>
> >>> void apic_timer_set(unsigned long timeout_ns)
> >>> {
> >>>   unsigned long long ticks =
> >>>           (unsigned long long)timeout_ns * divided_apic_freq;
> >>>   write_msr(X2APIC_TMICT, ticks / NS_PER_SEC);
> >>>    f15b5: 48 89 f8                mov    %rdi,%rax
> >>>    f15b8: b9 00 ca 9a 3b          mov    $0x3b9aca00,%ecx
> >>>    f15bd: 31 d2                   xor    %edx,%edx
> >>>    f15bf: 48 0f af 05 f1 10 ff    imul   -0xef0f(%rip),%rax        # 
> >>> e26b8 <divided_apic_freq>
> >>>    f15c6: ff 
> >>>    f15c7: 48 f7 f1                div    %rcx
> >>>    f15ca: b9 38 08 00 00          mov    $0x838,%ecx
> >>>    f15cf: 48 89 c2                mov    %rax,%rdx
> >>>    f15d2: 48 c1 ea 20             shr    $0x20,%rdx
> >>>    f15d6: 0f 30                   wrmsr  
> >>>    f15d8: c3                      retq   
> >>>
> >>> It is the *wrmsr* inside the apic_timer_set to generate the fault.
> >>> Since I'm not expert of x86 (I'm more an embedded guy), I'm asking for 
> >>> tips and ideas.
> >>
> >> Interesting. This writes to a 32-bit x2APIC register. The manual states:
> >> "The upper 32-bits of all x2APIC MSRs (except for the ICR) are
> >> reserved." But the timer value calculation let EDX (lower part of RDX)
> >> become non-zero.
> >>
> >>> RAX: 0x000000044b82f9d8 RBX: 0x00000000000f060f RCX: 0x0000000000000838
> >>> RDX: 0x0000000000000004 RSI: 0x0000000000000a36 RDI: 0xffffffffffffe134
> >>
> >> Never tested if hardware actually explodes over this, but it would have
> >> the right to do so. Simple check: confine ticks / NS_PER_SEC to 32 bits
> >> and see if that resolves the crash.
> >>
> >> But that may cause issues regarding the desired timeout. A careful
> >> analysis of what happens here /wrt timeout calculation will be needed.
> >> E.g. what is the timeout_ns value in those cases?
> >>
> >> As you copied from apic-demo and use the inmates library, those may
> >> share the issue.
> >>
> >> Jan
> > 
> > Thank You,
> > 
> > I was able to fix the previous issue.
> > Moreover I chaged the APIC Timer configuration, actually I'm using it as 
> > TSC-Deadline, getting better frequency stability in PWM generation.
> > 
> > But adding features at my demo I discovered what I think be a race error 
> > that could happen when there's a concurrence with a "Instruction Trap" 
> > (like the one needed to handle In/Out instruction) and a local IRQ (the one 
> > generated by APIC Timer).
> > 
> > Whe this scenario happens seams that the Context of main function is not 
> > correctly restored (volatile registers are corrupted, in particular %EDX 
> > register used as source register for In instruction is zeroed).
> > 
> > Exception Message:
> > 
> > FATAL: Invalid PIO read, port: 0 size: 1
> > RIP: 0x00000000000f0228 RSP: 0x00000000000dffd0 FLAGS: 246
> > RAX: 0x0000000000000000 RBX: 0x00000000000f05a8 RCX: 0x0000000000000000
> > RDX: 0x0000000000000000 RSI: 0x0000000000000a35 RDI: 0x0000000000000a36
> 
> Already rax is zero. But if you look at irq_common in
> inmates/lib/x86/int.c, you see that both rax and rdx are saved/restored
> on interrupts. Seems more likely that something goes wrong with the
> stack / rsp (stack pointer).
> 
> > CS: 10 BASE: 0x0000000000000000 AR-BYTES: a09b EFER.LMA 1
> > CR0: 0x0000000080010031 CR3: 0x00000000000f3000 CR4: 0x0000000000002020
> > EFER: 0x0000000000000500
> > Parking CPU 3 (Cell: "pwm-demo")
> > Closing cell "pwm-demo"
> > Page pool usage after cell destruction: mem 4316/16327, remap 16459/131072
> > CPU 3 received SIPI, vector 98
> > 
> > Faulty Code:
> > 
> > static inline u8 inb(u16 port)
> > {
> >    f021a:   89 f8                   mov    %edi,%eax
> >    f021c:   66 89 44 24 ec          mov    %ax,-0x14(%rsp)
> >     u8 v;
> >     asm volatile("inb %1,%0" : "=a" (v) : "dN" (port));
> >    f0221:   0f b7 44 24 ec          movzwl -0x14(%rsp),%eax
> 
> This pattern looks suspicious: the transfer of the port over the stack
> happens via an unreserved area, one that is overwritten when an
> interrupt hits right after mov %ax,-0x14(%rsp)... Ah, we are missing a
> magic switch. Does this help?
> 
> diff --git a/inmates/lib/x86/Makefile.lib b/inmates/lib/x86/Makefile.lib
> index f54259d..54bddae 100644
> --- a/inmates/lib/x86/Makefile.lib
> +++ b/inmates/lib/x86/Makefile.lib
> @@ -10,7 +10,7 @@
>  # the COPYING file in the top-level directory.
>  #
>  
> -KBUILD_CFLAGS += -m64
> +KBUILD_CFLAGS += -m64 -mno-red-zone
>  GCOV_PROFILE := n
>  
>  define DECLARE_TARGETS =
> 
> But I'm afraid the hypervisor needs it as well, and we were very lucky
> so far...
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT RDA ITP SES-DE
> Corporate Competence Center Embedded Linux


Perfect.

Errico

-- 
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Unexpected exception with apic timer as PWM

Reply via email to