On 2017-03-23 12:50, [email protected] wrote:
> Il giorno sabato 18 marzo 2017 09:32:19 UTC+1, Jan Kiszka ha scritto:
>> On 2017-03-17 14:42, [email protected] wrote:
>>> Il giorno venerdì 17 marzo 2017 13:43:32 UTC+1, J. Kiszka ha scritto:
>>>> On 2017-03-17 13:06, Claudio Scordino wrote:
>>>>> Dear all,
>>>>>
>>>>> we are facing an unexpected exception when running the apic timer to
>>>>> drive a GPIO as a software PWM.
>>>>>
>>>>> The platform is x86. The software runs in a bare-metal cell. The PWM
>>>>> frequency is 5 KHz.
>>>>>
>>>>> When the duty cycle is very high or very low (i.e., two subsequent
>>>>> interrupts get closer) we face the following unexpected exception:
>>>>>
>>>>> FATAL: Unhandled VM-Exit, reason 2
>>>>> qualification 0
>>>>> vectoring info: 0 interrupt info: 0
>>>>> RIP: 0x00000000000f15d6 RSP: 0x00000000000dff08 FLAGS: 10002
>>>>
>>>> "objdump -dS inmate-linked.o" can tell you which instruction at RIP
>>>> causes this fault. It's a triple fault, likely started off by a general
>>>> protection or page fault.
>>>>
>>>> Jan
>>>
>>> Hi I'm Errico, Claudio's coworker, and I'm actually playing with this issue.
>>>
>>> The fault happens when we re-arm the apic timer
>>>
>>> 00000000000f15b5 <apic_timer_set>:
>>>
>>> void apic_timer_set(unsigned long timeout_ns)
>>> {
>>> unsigned long long ticks =
>>> (unsigned long long)timeout_ns * divided_apic_freq;
>>> write_msr(X2APIC_TMICT, ticks / NS_PER_SEC);
>>> f15b5: 48 89 f8 mov %rdi,%rax
>>> f15b8: b9 00 ca 9a 3b mov $0x3b9aca00,%ecx
>>> f15bd: 31 d2 xor %edx,%edx
>>> f15bf: 48 0f af 05 f1 10 ff imul -0xef0f(%rip),%rax #
>>> e26b8 <divided_apic_freq>
>>> f15c6: ff
>>> f15c7: 48 f7 f1 div %rcx
>>> f15ca: b9 38 08 00 00 mov $0x838,%ecx
>>> f15cf: 48 89 c2 mov %rax,%rdx
>>> f15d2: 48 c1 ea 20 shr $0x20,%rdx
>>> f15d6: 0f 30 wrmsr
>>> f15d8: c3 retq
>>>
>>> It is the *wrmsr* inside the apic_timer_set to generate the fault.
>>> Since I'm not expert of x86 (I'm more an embedded guy), I'm asking for tips
>>> and ideas.
>>
>> Interesting. This writes to a 32-bit x2APIC register. The manual states:
>> "The upper 32-bits of all x2APIC MSRs (except for the ICR) are
>> reserved." But the timer value calculation let EDX (lower part of RDX)
>> become non-zero.
>>
>>> RAX: 0x000000044b82f9d8 RBX: 0x00000000000f060f RCX: 0x0000000000000838
>>> RDX: 0x0000000000000004 RSI: 0x0000000000000a36 RDI: 0xffffffffffffe134
>>
>> Never tested if hardware actually explodes over this, but it would have
>> the right to do so. Simple check: confine ticks / NS_PER_SEC to 32 bits
>> and see if that resolves the crash.
>>
>> But that may cause issues regarding the desired timeout. A careful
>> analysis of what happens here /wrt timeout calculation will be needed.
>> E.g. what is the timeout_ns value in those cases?
>>
>> As you copied from apic-demo and use the inmates library, those may
>> share the issue.
>>
>> Jan
>
> Thank You,
>
> I was able to fix the previous issue.
> Moreover I chaged the APIC Timer configuration, actually I'm using it as
> TSC-Deadline, getting better frequency stability in PWM generation.
>
> But adding features at my demo I discovered what I think be a race error that
> could happen when there's a concurrence with a "Instruction Trap" (like the
> one needed to handle In/Out instruction) and a local IRQ (the one generated
> by APIC Timer).
>
> Whe this scenario happens seams that the Context of main function is not
> correctly restored (volatile registers are corrupted, in particular %EDX
> register used as source register for In instruction is zeroed).
>
> Exception Message:
>
> FATAL: Invalid PIO read, port: 0 size: 1
> RIP: 0x00000000000f0228 RSP: 0x00000000000dffd0 FLAGS: 246
> RAX: 0x0000000000000000 RBX: 0x00000000000f05a8 RCX: 0x0000000000000000
> RDX: 0x0000000000000000 RSI: 0x0000000000000a35 RDI: 0x0000000000000a36
Already rax is zero. But if you look at irq_common in
inmates/lib/x86/int.c, you see that both rax and rdx are saved/restored
on interrupts. Seems more likely that something goes wrong with the
stack / rsp (stack pointer).
> CS: 10 BASE: 0x0000000000000000 AR-BYTES: a09b EFER.LMA 1
> CR0: 0x0000000080010031 CR3: 0x00000000000f3000 CR4: 0x0000000000002020
> EFER: 0x0000000000000500
> Parking CPU 3 (Cell: "pwm-demo")
> Closing cell "pwm-demo"
> Page pool usage after cell destruction: mem 4316/16327, remap 16459/131072
> CPU 3 received SIPI, vector 98
>
> Faulty Code:
>
> static inline u8 inb(u16 port)
> {
> f021a: 89 f8 mov %edi,%eax
> f021c: 66 89 44 24 ec mov %ax,-0x14(%rsp)
> u8 v;
> asm volatile("inb %1,%0" : "=a" (v) : "dN" (port));
> f0221: 0f b7 44 24 ec movzwl -0x14(%rsp),%eax
This pattern looks suspicious: the transfer of the port over the stack
happens via an unreserved area, one that is overwritten when an
interrupt hits right after mov %ax,-0x14(%rsp)... Ah, we are missing a
magic switch. Does this help?
diff --git a/inmates/lib/x86/Makefile.lib b/inmates/lib/x86/Makefile.lib
index f54259d..54bddae 100644
--- a/inmates/lib/x86/Makefile.lib
+++ b/inmates/lib/x86/Makefile.lib
@@ -10,7 +10,7 @@
# the COPYING file in the top-level directory.
#
-KBUILD_CFLAGS += -m64
+KBUILD_CFLAGS += -m64 -mno-red-zone
GCOV_PROFILE := n
define DECLARE_TARGETS =
But I'm afraid the hypervisor needs it as well, and we were very lucky
so far...
Jan
--
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux
--
You received this message because you are subscribed to the Google Groups
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.