On 11/08/17 11:56, Peter Zijlstra wrote:
> On Fri, Aug 11, 2017 at 11:23:10AM +0200, Vitaly Kuznetsov wrote:
>> Peter Zijlstra <pet...@infradead.org> writes:
>>> On Thu, Aug 10, 2017 at 07:08:22PM +0000, Jork Loeser wrote:
>>>>>> Subject: Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote
>>>>>> TLB flush
>>>>>> Hold on.. if we don't IPI for TLB invalidation. What serializes our
>>>>>> software page table walkers like fast_gup() ?
>>>>> Hypervisor may implement this functionality via an IPI.
>>>>> K. Y
>>>> HvFlushVirtualAddressList() states:
>>>> This call guarantees that by the time control returns back to the
>>>> caller, the observable effects of all flushes on the specified virtual
>>>> processors have occurred.
>>>> HvFlushVirtualAddressListEx() refers to HvFlushVirtualAddressList() as
>>>> adding sparse target VP lists.
>>>> Is this enough of a guarantee, or do you see other races?
>>> That's nowhere near enough. We need the remote CPU to have completed any
>>> guest IF section that was in progress at the time of the call.
>>> So if a host IPI can interrupt a guest while the guest has IF cleared,
>>> and we then process the host IPI -- clear the TLBs -- before resuming the
>>> guest, which still has IF cleared, we've got a problem.
>>> Because at that point, our software page-table walker, that relies on IF
>>> being clear to guarantee the page-tables exist, because it holds off the
>>> TLB invalidate and thereby the freeing of the pages, gets its pages
>>> ripped out from under it.
>> Oh, I see your concern. Hyper-V, however, is not the first x86
>> hypervisor trying to avoid IPIs on remote TLB flush, Xen does this
>> too. Briefly looking at xen_flush_tlb_others() I don't see anything
>> special, do we know how serialization is achieved there?
> No idea on how Xen works, I always just hope it goes away :-) But lets
> ask some Xen folks.
How is the software pagewalker relying on IF being clear safe at all (on
native, let alone under virtualisation)? Hardware has no architectural
requirement to keep entries in the TLB.
In the virtualisation case, at any point the vcpu can be scheduled on a
different pcpu even during a critical region like that, so the TLB
really can empty itself under your feet.