op 21-10-13 13:10, Thomas Hellstrom schreef: > On 10/21/2013 12:24 PM, Maarten Lankhorst wrote: >> op 21-10-13 12:10, Thomas Hellstrom schreef: >>> On 10/21/2013 11:48 AM, Maarten Lankhorst wrote: >>>> op 21-10-13 11:37, Thomas Hellstrom schreef: >>>>> On 10/21/2013 11:01 AM, Maarten Lankhorst wrote: >>>>>> op 21-10-13 10:48, Thomas Hellstrom schreef: >>>>>>> Hi! >>>>>>> >>>>>>> As discussed previously the current locking order in TTM of these locks >>>>>>> is bo::reserve -> vm::mmap_sem. This leads to a hack in >>>>>>> the TTM fault() handle to try and revert the locking order. If a >>>>>>> tryreserve failed, we tried to have the vm code release the mmap_sem() >>>>>>> and then schedule, to give the holder of bo::reserve a chance to >>>>>>> release the lock. This solution is no longer legal, since we've been >>>>>>> more or less kindly asked to remove the set_need_resched() call. >>>>>>> >>>>>>> Maarten has proposed to invert the locking order. I've previously said >>>>>>> I had no strong preference. The current locking order dates back from >>>>>>> the time when TTM wasn't using unmap_mapping_range() but walked the >>>>>>> page tables itself, updating PTEs as needed. Furthermore it was needed >>>>>>> for user bos that used get_user_pages() in the TTM populate and swap-in >>>>>>> methods. User-bos were removed some time ago but I'm looking at >>>>>>> re-adding them. They would suite the VMware model of cached-only pages >>>>>>> very well. I see uses both in the gallium API, XA's DMA functionality >>>>>>> and openCL. >>>>>>> >>>>>>> We would then need a somewhat nicer way to invert the locking order. >>>>>>> I've attached a solution that ups the mmap_sem and then reserves, but >>>>>>> due to how the fault API is done, we then need to release the reserve >>>>>>> and retry the fault. This of course opens up for starvation, but I >>>>>>> don't think starvation at this point is very likely: One thread being >>>>>>> refused to write or read from a buffer object because the GPU is >>>>>>> continously busy with it. If this *would* become a problem, it's >>>>>>> probably possible to modify the fault code to allow us to hold locks >>>>>>> until the retried fault, but that would be a bit invasive, since it >>>>>>> touches the arch code.... >>>>>>> >>>>>>> Basically I'm proposing to keep the current locking order. >>>>>> I'm not sure why we have to worry about mmap_sem lock being taken before >>>>>> bo::reserve. If we already hold mmap_sem, >>>>>> no extra locking is needed for get_user_pages. >>>>> Typically, they are populated outside of fault, as part of execbuf, where >>>>> we don't hold and don't want to hold mmap_sem(). In fact, >>>>> user bo's should not be remappable through the TTM VM system. Anyway, we >>>>> need to grab the mmap_sem inside ttm_populate for user buffers. >>>> If we don't allow mmapping user bo's through TTM, we can use special >>>> lockdep annotation when user-bo's are used. Normal bo's would have >>>> mmap_sem outer lock, bo::reserve inner lock, while those bo's would have >>>> the other way around. >>>> >>>> This might complicate validation a little, since you would have to reserve >>>> and validate all user-bo's before any normal bo's are reserved. But since >>>> this >>>> is meant to be a vmwgfx specific optimization I think it might be worth it. >>> Would that work (lockdep-wise) with user BO swapout as part of a normal BO >>> validation? During user BO swapout, we don't need mmap_sem, but the BO >>> needs to be reserved. But I guess it would work, since we use tryreserve >>> when walking the LRU lists? >> Correct. >> >>>>>> Releasing it is a bit silly. I think we should keep mmap_sem as outer >>>>>> lock, and have bo::reserve as inner, even if it might complicate support >>>>>> for user-bo's. I'm not sure what you can do >>>>>> with user-bo's that can't be done by allocating the same bo from kernel >>>>>> first and map + populate it. >>>>>> >>>>>> ~Maarten >>>>> Using DMA API analogy, user BOs correspond to using streaming DMA whereas >>>>> normal BOs correspond to alloced DMA memory buffers. >>>>> We boost performance and save resources. >>>> Yeah but it's vmwgfx specific. Nouveau and radeon have dedicated copy >>>> engines that can be used. Flushing the vm and stalling is probably more >>>> expensive than performing a memcpy >>> In the end, I'm not sure it will be vmwgfx specific once there is a working >>> example, and there are user-space APIs that will benefit from it. There are >>> other examples out there today that uses streaming DMA to feed the DMA >>> engines, although not through TTM, see for example via_dmablit.c. >>> >>> Also if we need a separate locking order for User BOs, what would be the >>> big benefit of having the locking order mmap_sem()->bo_reserve() ? >> Deterministic locking. I fear about livelocks that could happen otherwise >> with the right timing. Especially but not exclusively on -rt kernels. > But livelocks wouldn't be an issue anymore since we use a waiting reserve in > the retry path, right? The only issue we might theoretically be facing is > starvation in the fault path. > > With a two-step validation scheme I think we're up to real issues, like > bouncing ordinary BOs in and out of swap during a single execbuf... I would need to see some code, but I think it can be avoided by having a slowpath similar to i915 if it fails.
Anyway have 2 threads fault on the same bo in different offsets, so fault handler is called twice on the same bo. You could end up a loop, first one grabbed the bo lock without mmap_sem, second one does trylock, which fails then unlocks mmap_sem, and acquires the bo lock again. At which point the first thread acquired mmap_sem and does a trylock. So the same thing can theoretically happen infinitely over and over again. ~Maarten