Re: [HelenOS-devel] TLB shootdown race?

Jakub Jermar Fri, 29 Jun 2012 06:26:17 -0700

On 29.6.2012 5:19, Adam Hraska wrote:
> void tlb_shootdown_ipi_recv(void)
> {
>       // snip
>       for (i = 0; i < CPU->tlb_messages_count; CPU->tlb_messages_count--) {
>               tlb_invalidate_type_t type = CPU->tlb_messages[i].type;
>               
>               // snip, no changes to "i" or tlb_message_count





>               
>               if (type == TLB_INVL_ALL)
>                       break;
>       }
>       //snip, no change to "i" or tlb_message_count
> }
> 
> There are two minor bugs:
> 1) The for() loop executes the first message in the
> queue "tlb_messages_count" times instead of processing
> all messages one by one. So instead of:
>       for (i = 0; i < CPU->tlb_messages_count; CPU->tlb_messages_count--) {
>       }
> .. it should be:
>       for (i = 0; i < CPU->tlb_messages_count; i++) {
>       }
>       CPU->tlb_messages_count = 0;

Right, except I wouldn't call it a minor bug. The whole loop is very
suspicious. It was introduced by me in revision 1002 of the old svn
repository in March 2006. One would have thought that I could be able to
write a simple for loop then.

> 2) If it encounters a message/request to flush the
> entire TLB if breaks from the loop because the remaining
> messages have nothing to change in the TLB. Unfortunately,
> tlb_messages_count is left unchanged, so the next time
> tlb_shootdown_ipi_recv() runs, it will process previous
> messages again. Ooops :-).

Unfortunate, but not as bad as not processing the previous messages at
all like in the previous case.

> Race?
> -----
> I am not sure if the code accounts for the possibility
> of a race between an unrelated interrupt and a pending
> TLB shootdown IPI (ie its handler tlb_shootdown_ipi_recv())
> after tlb_shootdown_finalize() returns.
> 
> Imagine two cpus invoke tlb_shootdown_start() at the
> same time. Say cpu1 acquires the tlblock spinlock and
> cpu2 remains spinning for tlblock. cpu1 unmaps a page
> and calls tlb_shootdown_finalize(). Now cpu2 should
> remove the mapping from its local TLB. It, however,
> executes the whole tlb_shootdown_start(). Interrupts
> are disabled so until the kernel/cpu2 calls tlb_shoot-
> down_finalize() (which restores interrupts) it won't
> invoke its pending TLB IPI handler tlb_shootdown_ipi-
> _recv() and remove the outdated mapping from its local
> TLB. cpu2, therefore, calls tlb_shootdown_finalize()
> and restores interrupts. The problem is that even though
> tlb_shootdown_ipi_recv()'s interrupt is pending, it
> might not be processed right after cpu2's tlb_shootdown-
> _finalize() due to another pending interrupt. Depending
> on the order in which interrupts are processed this
> other interrupt may be handled first and may access
> the unmapped page (but still in cpu2's TLB) and corrupt
> a frame before tlb_shootdown_ipi_recv runs and purges
> the TLB. Suppose that the other interrupt handles eg
> the network card and the just unmapped page happens
> to be a buffer supplied by user space to store incoming
> traffic. In absence of the race, the kernel would kill
> the offending user program for accessing unmapped memory.
> In presence of the race, the user program may alter
> contents of a kernel owned frame.

Interesting. What you describe here seems possible in general,
especially if we allowed wild interrupt handlers in the kernel. The
question is whether it could happen in reality, given all the current
constrains.

The only two kinds of memory that can be unmapped or change flags and
hence require a TLB shootdown is userspace pages and kernel non-identity
pages. TLB shootdown also happens when stealing an address space
identifier for an address space that wants to run and does not have one.

Kernel interrupt handlers may not touch userspace memory for obvious
reasons, but are allowed to touch kernel non-identity mappings. The
kernel non-identity memory is managed exclusively by the kernel and so
it looks to me that this race can only happen if there is a
use-after-free bug in the kernel, such as forgetting to unregister an
IRQ pseudocode handler after freeing up its virtual address ranges.
Similarly, all interrupt handlers run with the kernel ASID, which cannot
be stolen.

So at this point, with my current understanding of the situation, I
don't think this is something that needs fixing. Pretty please, correct
me if I am wrong :-)

Thanks once again for pointing out the other problems though. I will try
to address these issues over the weekend or if someone volunteers to fix
them on their own, I will be happy to merge the fix or accept patches.

Jakub

_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/cgi-bin/listinfo/helenos-devel

Re: [HelenOS-devel] TLB shootdown race?

Reply via email to