TTM Locking order of bo::reserve -> vm::mmap_sem

Maarten Lankhorst Mon, 21 Oct 2013 12:24:13 +0200

op 21-10-13 12:10, Thomas Hellstrom schreef:
> On 10/21/2013 11:48 AM, Maarten Lankhorst wrote:
>> op 21-10-13 11:37, Thomas Hellstrom schreef:
>>> On 10/21/2013 11:01 AM, Maarten Lankhorst wrote:
>>>> op 21-10-13 10:48, Thomas Hellstrom schreef:
>>>>> Hi!
>>>>>
>>>>> As discussed previously the current locking order in TTM of these locks 
>>>>> is bo::reserve -> vm::mmap_sem. This leads to a hack in
>>>>> the TTM fault() handle to try and revert the locking order. If a 
>>>>> tryreserve failed, we tried to have the vm code release the mmap_sem() 
>>>>> and then schedule, to give the holder of bo::reserve a chance to release 
>>>>> the lock. This solution is no longer legal, since we've been more or less 
>>>>> kindly asked to remove the set_need_resched() call.
>>>>>
>>>>> Maarten has proposed to invert the locking order. I've previously said I 
>>>>> had no strong preference. The current locking order dates back from the 
>>>>> time when TTM wasn't using unmap_mapping_range() but walked the page 
>>>>> tables itself, updating PTEs as needed. Furthermore it was needed for 
>>>>> user bos that used get_user_pages() in the TTM populate and swap-in 
>>>>> methods. User-bos were removed some time ago but I'm looking at re-adding 
>>>>> them. They would suite the VMware model of cached-only pages very well. I 
>>>>> see uses both in the gallium API, XA's DMA functionality and openCL.
>>>>>
>>>>> We would then need a somewhat nicer way to invert the locking order. I've 
>>>>> attached a solution that ups the mmap_sem and then reserves, but due to 
>>>>> how the fault API is done, we then need to release the reserve and retry 
>>>>> the fault. This of course opens up for starvation, but I don't think 
>>>>> starvation at this point is very likely: One thread being refused to 
>>>>> write or read from a buffer object because the GPU is continously busy 
>>>>> with it. If this *would* become a problem, it's probably possible to 
>>>>> modify the fault code to allow us to hold locks until the retried fault, 
>>>>> but that would be a bit invasive, since it touches the arch code....
>>>>>
>>>>> Basically I'm proposing to keep the current locking order.
>>>> I'm not sure why we have to worry about mmap_sem lock being taken before 
>>>> bo::reserve. If we already hold mmap_sem,
>>>> no extra locking is needed for get_user_pages.
>>> Typically, they are populated outside of fault, as part of execbuf, where 
>>> we don't hold and don't want to hold mmap_sem(). In fact,
>>> user bo's should not be remappable through the TTM VM system. Anyway, we 
>>> need to grab the mmap_sem inside ttm_populate for user buffers.
>> If we don't allow mmapping user bo's through TTM, we can use special lockdep 
>> annotation when user-bo's are used. Normal bo's would have
>> mmap_sem outer lock, bo::reserve inner lock, while those bo's would have the 
>> other way around.
>>
>> This might complicate validation a little, since you would have to reserve 
>> and validate all user-bo's before any normal bo's are reserved. But since 
>> this
>> is meant to be a vmwgfx specific optimization I think it might be worth it.
>
> Would that work (lockdep-wise) with user BO swapout as part of a normal BO 
> validation? During user BO swapout, we don't need mmap_sem, but the BO needs 
> to be reserved. But I guess it would work, since we use tryreserve when 
> walking the LRU lists?
Correct.


>>
>>>>    Releasing it is a bit silly. I think we should keep mmap_sem as outer
>>>> lock, and have bo::reserve as inner, even if it might complicate support 
>>>> for user-bo's. I'm not sure what you can do
>>>> with user-bo's that can't be done by allocating the same bo from kernel 
>>>> first and map + populate it.
>>>>
>>>> ~Maarten
>>> Using DMA API analogy, user BOs correspond to using streaming DMA whereas 
>>> normal BOs correspond to alloced DMA memory buffers.
>>> We boost performance and save resources.
>> Yeah but it's vmwgfx specific. Nouveau and radeon have dedicated copy 
>> engines that can be used. Flushing the vm and stalling is probably more
>> expensive than performing a memcpy
> In the end, I'm not sure it will be vmwgfx specific once there is a working 
> example, and there are user-space APIs that will benefit from it. There are 
> other examples out there today that uses streaming DMA to feed the DMA 
> engines, although not through TTM, see for example via_dmablit.c.
>
> Also if we need a separate locking order for User BOs, what would be the big 
> benefit of having the locking order mmap_sem()->bo_reserve() ?
Deterministic locking. I fear about livelocks that could happen otherwise with 
the right timing. Especially but not exclusively on -rt kernels.

~Maarten

TTM Locking order of bo::reserve -> vm::mmap_sem

Reply via email to