On Wed, 11 May 2016, Tetsuo Handa wrote: > Thomas Gleixner wrote: > > On Mon, 9 May 2016, Tetsuo Handa wrote: > > > > > > It seems to me that APIC_BASE APIC_ICR APIC_ICR_BUSY are all constant > > > regardless of calling cpu. Thus, native_apic_mem_read() and > > > native_apic_mem_write() are using globally shared constant memory > > > address and __xapic_wait_icr_idle() is making decision based on > > > globally shared constant memory address. Am I right? > > > > No. The APIC address space is per cpu. It's the same address but it's always > > accessing the local APIC of the cpu on which it is called. > > Same address but per CPU magic. I see. > > Now, I'm trying with CONFIG_TRACE_IRQFLAGS=y and I can observe that > irq event stamp shows that hardirqs are disabled for two CPUs when I hit > this bug. It seems to me that this bug is triggered when two CPUs are > concurrently calling smp_call_function_many() with wait == true.
> [ 180.434649] hardirqs last enabled at (5324977): [<ffff88007860f990>] > 0xffff88007860f990 > [ 180.434650] hardirqs last disabled at (5324978): [<ffff88007860f990>] > 0xffff88007860f990 Those addresses are on the stack !?! That makes no sense whatsoever. > [ 180.434659] task: ffff88007a046440 ti: ffff88007860c000 task.ti: > ffff88007860c000 > [ 180.434665] RIP: 0010:[<ffffffff811105bf>] [<ffffffff811105bf>] > smp_call_function_many+0x21f/0x2c0 > [ 180.434666] RSP: 0000:ffff88007860f950 EFLAGS: 00000202 And on this CPU interrupt are enabled because the IF bit (9) in EFLAGS is set. > [ 180.548951] hardirqs last enabled at (601147): [<ffff880078cffa00>] > 0xffff880078cffa00 > [ 180.551359] hardirqs last disabled at (601148): [<ffff880078cffa00>] > 0xffff880078cffa00 Equally crap. > [ 180.563802] task: ffff880077ad1940 ti: ffff880078cfc000 task.ti: > ffff880078cfc000 > [ 180.565984] RIP: 0010:[<ffffffff811105bf>] [<ffffffff811105bf>] > smp_call_function_many+0x21f/0x2c0 > [ 180.568517] RSP: 0000:ffff880078cff9c0 EFLAGS: 00000202 And again interrupts are enabled. Thanks, tglx