Am 15.11.2010 21:20, Philippe Gerum wrote:
> On Mon, 2010-11-15 at 20:31 +0100, Jan Kiszka wrote:
>> Hi Philippe,
>>
>> debugging some variant of I-pipe over an x86-32 target, I think I found
>> some fairly old flaw in the IRQ virtualization that causes rescheduling
>> delays (up to deadlocks) for Linux:
>>
>> - we are in sysenter_tail (other exit paths should be affected as well)
>> - we DISABLE_INTERRUPTS, but only virtually
>> - we go past "testl $_TIF_ALLWORK_MASK, %ecx", nothing to be done
>> - an IRQ for Linux arrives, it is pushed to the backlog
>> - __ipipe_unstall_iret_root replays the IRQ as the regs we are about to
>>   return to have IF set (obviously, we return from a syscall)
>> - the Linux IRQ handler sets _TIF_NEED_RESCHED, but doesn't perform the
>>   work on return as __ipipe_sync_stage set the stall flag for the Linux
>>   domain before calling the handler
>> - but now the preempted sysenter return also does no reschedule as it
>>   already passed the check - bang!
> 
> Ouch. You must have had a really busy Monday to find this one.
> 
>>
>> Another variant of this Linux rescheduling issue:
>>
>> - we are in a lengthy loop inside the kernel, but we are preemptible
>>   most of the time
>> - after disabling Linux IRQs briefly, we are calling
>>   local_irq_enable() again
>> - in the meantime, we received a Linux IRQ which is now pending in the
>>   backlog
>> - __ipipe_unstall_root triggers __ipipe_sync_stage
>> - Linux handler is called, sets NEED_RESCHED but does not reschedule
>>   (see above)
>> - we do not test for resched again as we are not returning to user
>>   space, and that for quite some time - bang!
>>
>> I think both issues are only related to virtualizing DISABLE_INTERRUPTS
>> for entry_32.S and I wonder if this doesn't finally qualify for a switch
>> to the 64-bit model. Or do you see simpler fixes?
>>
> 
> We could probably use hw masking from sysenter_tail and on, but quite
> frankly, I think this time, enough is enough and this bug calls for a
> radical fix, which is indeed getting rid of interrupt virtualization in
> the kernel entry/exit paths for x86_32, which no other arch ever
> implemented anyway.
> 
> The decision to virtualize there as well was taken circa 2.4.18, when
> upstream did not care that much about latency yet. Things have changed,
> and there is no more reason to virtualize interrupts in very short
> critical sections, at the expense of a lot more complexity.
> 
> - __ipipe_unstall_iret_root
> - __ipipe_kpreempt_root
> and much of the nonsense we do to track linux's interrupt state would go
> away.
> 

Much involved code is shared here, so I will check with $customer if and
how we can contribute to such a cleanup.

Jan

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Adeos-main mailing list
[email protected]
https://mail.gna.org/listinfo/adeos-main

Reply via email to