On 2017-03-23 12:50, [email protected] wrote:
> Il giorno sabato 18 marzo 2017 09:32:19 UTC+1, Jan Kiszka ha scritto:
>> On 2017-03-17 14:42, [email protected] wrote:
>>> Il giorno venerdì 17 marzo 2017 13:43:32 UTC+1, J. Kiszka ha scritto:
>>>> On 2017-03-17 13:06, Claudio Scordino wrote:
>>>>> Dear all,
>>>>>
>>>>> we are facing an unexpected exception when running the apic timer to
>>>>> drive a GPIO as a software PWM.
>>>>>
>>>>> The platform is x86. The software runs in a bare-metal cell. The PWM
>>>>> frequency is 5 KHz.
>>>>>
>>>>> When the duty cycle is very high or very low (i.e., two subsequent
>>>>> interrupts get closer) we face the following unexpected exception:
>>>>>
>>>>> FATAL: Unhandled VM-Exit, reason 2
>>>>> qualification 0
>>>>> vectoring info: 0 interrupt info: 0
>>>>> RIP: 0x00000000000f15d6 RSP: 0x00000000000dff08 FLAGS: 10002
>>>>
>>>> "objdump -dS inmate-linked.o" can tell you which instruction at RIP
>>>> causes this fault. It's a triple fault, likely started off by a general
>>>> protection or page fault.
>>>>
>>>> Jan
>>>
>>> Hi I'm Errico, Claudio's coworker, and I'm actually playing with this issue.
>>>
>>> The fault happens when we re-arm the apic timer
>>>
>>> 00000000000f15b5 <apic_timer_set>:
>>>
>>> void apic_timer_set(unsigned long timeout_ns)
>>> {
>>>     unsigned long long ticks =
>>>             (unsigned long long)timeout_ns * divided_apic_freq;
>>>     write_msr(X2APIC_TMICT, ticks / NS_PER_SEC);
>>>    f15b5:   48 89 f8                mov    %rdi,%rax
>>>    f15b8:   b9 00 ca 9a 3b          mov    $0x3b9aca00,%ecx
>>>    f15bd:   31 d2                   xor    %edx,%edx
>>>    f15bf:   48 0f af 05 f1 10 ff    imul   -0xef0f(%rip),%rax        # 
>>> e26b8 <divided_apic_freq>
>>>    f15c6:   ff 
>>>    f15c7:   48 f7 f1                div    %rcx
>>>    f15ca:   b9 38 08 00 00          mov    $0x838,%ecx
>>>    f15cf:   48 89 c2                mov    %rax,%rdx
>>>    f15d2:   48 c1 ea 20             shr    $0x20,%rdx
>>>    f15d6:   0f 30                   wrmsr  
>>>    f15d8:   c3                      retq   
>>>
>>> It is the *wrmsr* inside the apic_timer_set to generate the fault.
>>> Since I'm not expert of x86 (I'm more an embedded guy), I'm asking for tips 
>>> and ideas.
>>
>> Interesting. This writes to a 32-bit x2APIC register. The manual states:
>> "The upper 32-bits of all x2APIC MSRs (except for the ICR) are
>> reserved." But the timer value calculation let EDX (lower part of RDX)
>> become non-zero.
>>
>>> RAX: 0x000000044b82f9d8 RBX: 0x00000000000f060f RCX: 0x0000000000000838
>>> RDX: 0x0000000000000004 RSI: 0x0000000000000a36 RDI: 0xffffffffffffe134
>>
>> Never tested if hardware actually explodes over this, but it would have
>> the right to do so. Simple check: confine ticks / NS_PER_SEC to 32 bits
>> and see if that resolves the crash.
>>
>> But that may cause issues regarding the desired timeout. A careful
>> analysis of what happens here /wrt timeout calculation will be needed.
>> E.g. what is the timeout_ns value in those cases?
>>
>> As you copied from apic-demo and use the inmates library, those may
>> share the issue.
>>
>> Jan
> 
> Thank You,
> 
> I was able to fix the previous issue.
> Moreover I chaged the APIC Timer configuration, actually I'm using it as 
> TSC-Deadline, getting better frequency stability in PWM generation.
> 
> But adding features at my demo I discovered what I think be a race error that 
> could happen when there's a concurrence with a "Instruction Trap" (like the 
> one needed to handle In/Out instruction) and a local IRQ (the one generated 
> by APIC Timer).
> 
> Whe this scenario happens seams that the Context of main function is not 
> correctly restored (volatile registers are corrupted, in particular %EDX 
> register used as source register for In instruction is zeroed).
> 
> Exception Message:
> 
> FATAL: Invalid PIO read, port: 0 size: 1
> RIP: 0x00000000000f0228 RSP: 0x00000000000dffd0 FLAGS: 246
> RAX: 0x0000000000000000 RBX: 0x00000000000f05a8 RCX: 0x0000000000000000
> RDX: 0x0000000000000000 RSI: 0x0000000000000a35 RDI: 0x0000000000000a36

Already rax is zero. But if you look at irq_common in
inmates/lib/x86/int.c, you see that both rax and rdx are saved/restored
on interrupts. Seems more likely that something goes wrong with the
stack / rsp (stack pointer).

> CS: 10 BASE: 0x0000000000000000 AR-BYTES: a09b EFER.LMA 1
> CR0: 0x0000000080010031 CR3: 0x00000000000f3000 CR4: 0x0000000000002020
> EFER: 0x0000000000000500
> Parking CPU 3 (Cell: "pwm-demo")
> Closing cell "pwm-demo"
> Page pool usage after cell destruction: mem 4316/16327, remap 16459/131072
> CPU 3 received SIPI, vector 98
> 
> Faulty Code:
> 
> static inline u8 inb(u16 port)
> {
>    f021a:     89 f8                   mov    %edi,%eax
>    f021c:     66 89 44 24 ec          mov    %ax,-0x14(%rsp)
>       u8 v;
>       asm volatile("inb %1,%0" : "=a" (v) : "dN" (port));
>    f0221:     0f b7 44 24 ec          movzwl -0x14(%rsp),%eax

This pattern looks suspicious: the transfer of the port over the stack
happens via an unreserved area, one that is overwritten when an
interrupt hits right after mov %ax,-0x14(%rsp)... Ah, we are missing a
magic switch. Does this help?

diff --git a/inmates/lib/x86/Makefile.lib b/inmates/lib/x86/Makefile.lib
index f54259d..54bddae 100644
--- a/inmates/lib/x86/Makefile.lib
+++ b/inmates/lib/x86/Makefile.lib
@@ -10,7 +10,7 @@
 # the COPYING file in the top-level directory.
 #
 
-KBUILD_CFLAGS += -m64
+KBUILD_CFLAGS += -m64 -mno-red-zone
 GCOV_PROFILE := n
 
 define DECLARE_TARGETS =

But I'm afraid the hypervisor needs it as well, and we were very lucky
so far...

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

-- 
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to