Il giorno giovedì 23 marzo 2017 16:58:44 UTC+1, J. Kiszka ha scritto: > On 2017-03-23 12:50, [email protected] wrote: > > Il giorno sabato 18 marzo 2017 09:32:19 UTC+1, Jan Kiszka ha scritto: > >> On 2017-03-17 14:42, [email protected] wrote: > >>> Il giorno venerdì 17 marzo 2017 13:43:32 UTC+1, J. Kiszka ha scritto: > >>>> On 2017-03-17 13:06, Claudio Scordino wrote: > >>>>> Dear all, > >>>>> > >>>>> we are facing an unexpected exception when running the apic timer to > >>>>> drive a GPIO as a software PWM. > >>>>> > >>>>> The platform is x86. The software runs in a bare-metal cell. The PWM > >>>>> frequency is 5 KHz. > >>>>> > >>>>> When the duty cycle is very high or very low (i.e., two subsequent > >>>>> interrupts get closer) we face the following unexpected exception: > >>>>> > >>>>> FATAL: Unhandled VM-Exit, reason 2 > >>>>> qualification 0 > >>>>> vectoring info: 0 interrupt info: 0 > >>>>> RIP: 0x00000000000f15d6 RSP: 0x00000000000dff08 FLAGS: 10002 > >>>> > >>>> "objdump -dS inmate-linked.o" can tell you which instruction at RIP > >>>> causes this fault. It's a triple fault, likely started off by a general > >>>> protection or page fault. > >>>> > >>>> Jan > >>> > >>> Hi I'm Errico, Claudio's coworker, and I'm actually playing with this > >>> issue. > >>> > >>> The fault happens when we re-arm the apic timer > >>> > >>> 00000000000f15b5 <apic_timer_set>: > >>> > >>> void apic_timer_set(unsigned long timeout_ns) > >>> { > >>> unsigned long long ticks = > >>> (unsigned long long)timeout_ns * divided_apic_freq; > >>> write_msr(X2APIC_TMICT, ticks / NS_PER_SEC); > >>> f15b5: 48 89 f8 mov %rdi,%rax > >>> f15b8: b9 00 ca 9a 3b mov $0x3b9aca00,%ecx > >>> f15bd: 31 d2 xor %edx,%edx > >>> f15bf: 48 0f af 05 f1 10 ff imul -0xef0f(%rip),%rax # > >>> e26b8 <divided_apic_freq> > >>> f15c6: ff > >>> f15c7: 48 f7 f1 div %rcx > >>> f15ca: b9 38 08 00 00 mov $0x838,%ecx > >>> f15cf: 48 89 c2 mov %rax,%rdx > >>> f15d2: 48 c1 ea 20 shr $0x20,%rdx > >>> f15d6: 0f 30 wrmsr > >>> f15d8: c3 retq > >>> > >>> It is the *wrmsr* inside the apic_timer_set to generate the fault. > >>> Since I'm not expert of x86 (I'm more an embedded guy), I'm asking for > >>> tips and ideas. > >> > >> Interesting. This writes to a 32-bit x2APIC register. The manual states: > >> "The upper 32-bits of all x2APIC MSRs (except for the ICR) are > >> reserved." But the timer value calculation let EDX (lower part of RDX) > >> become non-zero. > >> > >>> RAX: 0x000000044b82f9d8 RBX: 0x00000000000f060f RCX: 0x0000000000000838 > >>> RDX: 0x0000000000000004 RSI: 0x0000000000000a36 RDI: 0xffffffffffffe134 > >> > >> Never tested if hardware actually explodes over this, but it would have > >> the right to do so. Simple check: confine ticks / NS_PER_SEC to 32 bits > >> and see if that resolves the crash. > >> > >> But that may cause issues regarding the desired timeout. A careful > >> analysis of what happens here /wrt timeout calculation will be needed. > >> E.g. what is the timeout_ns value in those cases? > >> > >> As you copied from apic-demo and use the inmates library, those may > >> share the issue. > >> > >> Jan > > > > Thank You, > > > > I was able to fix the previous issue. > > Moreover I chaged the APIC Timer configuration, actually I'm using it as > > TSC-Deadline, getting better frequency stability in PWM generation. > > > > But adding features at my demo I discovered what I think be a race error > > that could happen when there's a concurrence with a "Instruction Trap" > > (like the one needed to handle In/Out instruction) and a local IRQ (the one > > generated by APIC Timer). > > > > Whe this scenario happens seams that the Context of main function is not > > correctly restored (volatile registers are corrupted, in particular %EDX > > register used as source register for In instruction is zeroed). > > > > Exception Message: > > > > FATAL: Invalid PIO read, port: 0 size: 1 > > RIP: 0x00000000000f0228 RSP: 0x00000000000dffd0 FLAGS: 246 > > RAX: 0x0000000000000000 RBX: 0x00000000000f05a8 RCX: 0x0000000000000000 > > RDX: 0x0000000000000000 RSI: 0x0000000000000a35 RDI: 0x0000000000000a36 > > Already rax is zero. But if you look at irq_common in > inmates/lib/x86/int.c, you see that both rax and rdx are saved/restored > on interrupts. Seems more likely that something goes wrong with the > stack / rsp (stack pointer). > > > CS: 10 BASE: 0x0000000000000000 AR-BYTES: a09b EFER.LMA 1 > > CR0: 0x0000000080010031 CR3: 0x00000000000f3000 CR4: 0x0000000000002020 > > EFER: 0x0000000000000500 > > Parking CPU 3 (Cell: "pwm-demo") > > Closing cell "pwm-demo" > > Page pool usage after cell destruction: mem 4316/16327, remap 16459/131072 > > CPU 3 received SIPI, vector 98 > > > > Faulty Code: > > > > static inline u8 inb(u16 port) > > { > > f021a: 89 f8 mov %edi,%eax > > f021c: 66 89 44 24 ec mov %ax,-0x14(%rsp) > > u8 v; > > asm volatile("inb %1,%0" : "=a" (v) : "dN" (port)); > > f0221: 0f b7 44 24 ec movzwl -0x14(%rsp),%eax > > This pattern looks suspicious: the transfer of the port over the stack > happens via an unreserved area, one that is overwritten when an > interrupt hits right after mov %ax,-0x14(%rsp)... Ah, we are missing a > magic switch. Does this help? > > diff --git a/inmates/lib/x86/Makefile.lib b/inmates/lib/x86/Makefile.lib > index f54259d..54bddae 100644 > --- a/inmates/lib/x86/Makefile.lib > +++ b/inmates/lib/x86/Makefile.lib > @@ -10,7 +10,7 @@ > # the COPYING file in the top-level directory. > # > > -KBUILD_CFLAGS += -m64 > +KBUILD_CFLAGS += -m64 -mno-red-zone > GCOV_PROFILE := n > > define DECLARE_TARGETS = > > But I'm afraid the hypervisor needs it as well, and we were very lucky > so far... > > Jan > > -- > Siemens AG, Corporate Technology, CT RDA ITP SES-DE > Corporate Competence Center Embedded Linux
Perfect. Errico -- You received this message because you are subscribed to the Google Groups "Jailhouse" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
