TTM Locking order of bo::reserve -> vm::mmap_sem

Maarten Lankhorst Mon, 21 Oct 2013 15:21:28 +0200

op 21-10-13 13:10, Thomas Hellstrom schreef:
> On 10/21/2013 12:24 PM, Maarten Lankhorst wrote:
>> op 21-10-13 12:10, Thomas Hellstrom schreef:
>>> On 10/21/2013 11:48 AM, Maarten Lankhorst wrote:
>>>> op 21-10-13 11:37, Thomas Hellstrom schreef:
>>>>> On 10/21/2013 11:01 AM, Maarten Lankhorst wrote:
>>>>>> op 21-10-13 10:48, Thomas Hellstrom schreef:
>>>>>>> Hi!
>>>>>>>
>>>>>>> As discussed previously the current locking order in TTM of these locks 
>>>>>>> is bo::reserve -> vm::mmap_sem. This leads to a hack in
>>>>>>> the TTM fault() handle to try and revert the locking order. If a 
>>>>>>> tryreserve failed, we tried to have the vm code release the mmap_sem() 
>>>>>>> and then schedule, to give the holder of bo::reserve a chance to 
>>>>>>> release the lock. This solution is no longer legal, since we've been 
>>>>>>> more or less kindly asked to remove the set_need_resched() call.
>>>>>>>
>>>>>>> Maarten has proposed to invert the locking order. I've previously said 
>>>>>>> I had no strong preference. The current locking order dates back from 
>>>>>>> the time when TTM wasn't using unmap_mapping_range() but walked the 
>>>>>>> page tables itself, updating PTEs as needed. Furthermore it was needed 
>>>>>>> for user bos that used get_user_pages() in the TTM populate and swap-in 
>>>>>>> methods. User-bos were removed some time ago but I'm looking at 
>>>>>>> re-adding them. They would suite the VMware model of cached-only pages 
>>>>>>> very well. I see uses both in the gallium API, XA's DMA functionality 
>>>>>>> and openCL.
>>>>>>>
>>>>>>> We would then need a somewhat nicer way to invert the locking order. 
>>>>>>> I've attached a solution that ups the mmap_sem and then reserves, but 
>>>>>>> due to how the fault API is done, we then need to release the reserve 
>>>>>>> and retry the fault. This of course opens up for starvation, but I 
>>>>>>> don't think starvation at this point is very likely: One thread being 
>>>>>>> refused to write or read from a buffer object because the GPU is 
>>>>>>> continously busy with it. If this *would* become a problem, it's 
>>>>>>> probably possible to modify the fault code to allow us to hold locks 
>>>>>>> until the retried fault, but that would be a bit invasive, since it 
>>>>>>> touches the arch code....
>>>>>>>
>>>>>>> Basically I'm proposing to keep the current locking order.
>>>>>> I'm not sure why we have to worry about mmap_sem lock being taken before 
>>>>>> bo::reserve. If we already hold mmap_sem,
>>>>>> no extra locking is needed for get_user_pages.
>>>>> Typically, they are populated outside of fault, as part of execbuf, where 
>>>>> we don't hold and don't want to hold mmap_sem(). In fact,
>>>>> user bo's should not be remappable through the TTM VM system. Anyway, we 
>>>>> need to grab the mmap_sem inside ttm_populate for user buffers.
>>>> If we don't allow mmapping user bo's through TTM, we can use special 
>>>> lockdep annotation when user-bo's are used. Normal bo's would have
>>>> mmap_sem outer lock, bo::reserve inner lock, while those bo's would have 
>>>> the other way around.
>>>>
>>>> This might complicate validation a little, since you would have to reserve 
>>>> and validate all user-bo's before any normal bo's are reserved. But since 
>>>> this
>>>> is meant to be a vmwgfx specific optimization I think it might be worth it.
>>> Would that work (lockdep-wise) with user BO swapout as part of a normal BO 
>>> validation? During user BO swapout, we don't need mmap_sem, but the BO 
>>> needs to be reserved. But I guess it would work, since we use tryreserve 
>>> when walking the LRU lists?
>> Correct.
>>
>>>>>>     Releasing it is a bit silly. I think we should keep mmap_sem as outer
>>>>>> lock, and have bo::reserve as inner, even if it might complicate support 
>>>>>> for user-bo's. I'm not sure what you can do
>>>>>> with user-bo's that can't be done by allocating the same bo from kernel 
>>>>>> first and map + populate it.
>>>>>>
>>>>>> ~Maarten
>>>>> Using DMA API analogy, user BOs correspond to using streaming DMA whereas 
>>>>> normal BOs correspond to alloced DMA memory buffers.
>>>>> We boost performance and save resources.
>>>> Yeah but it's vmwgfx specific. Nouveau and radeon have dedicated copy 
>>>> engines that can be used. Flushing the vm and stalling is probably more
>>>> expensive than performing a memcpy
>>> In the end, I'm not sure it will be vmwgfx specific once there is a working 
>>> example, and there are user-space APIs that will benefit from it. There are 
>>> other examples out there today that uses streaming DMA to feed the DMA 
>>> engines, although not through TTM, see for example via_dmablit.c.
>>>
>>> Also if we need a separate locking order for User BOs, what would be the 
>>> big benefit of having the locking order mmap_sem()->bo_reserve() ?
>> Deterministic locking. I fear about livelocks that could happen otherwise 
>> with the right timing. Especially but not exclusively on -rt kernels.
> But livelocks wouldn't be an issue anymore since we use a waiting reserve in 
> the retry path, right? The only issue we might theoretically be facing is 
> starvation in the fault path.
>
> With a two-step validation scheme I think we're up to real issues, like 
> bouncing ordinary BOs in and out of swap during a single execbuf...
I would need to see some code, but I think it can be avoided by having a 
slowpath similar to i915 if it fails.


Anyway have 2 threads fault on the same bo in different offsets, so fault 
handler is called twice on the same bo.

You could end up a loop, first one grabbed the bo lock without mmap_sem, second 
one does trylock, which fails then unlocks mmap_sem, and acquires the bo lock 
again. At which point the first thread acquired mmap_sem and does a trylock. So 
the same thing can theoretically happen infinitely over and over again.

~Maarten

TTM Locking order of bo::reserve -> vm::mmap_sem

Reply via email to