Hi Dave, > On Jan 30, 2026, at 10:01 PM, Dave Airlie <[email protected]> wrote: > > On Sat, 31 Jan 2026 at 07:14, Joel Fernandes <[email protected]> wrote: >> >> >> >>> On 1/29/2026 10:38 PM, John Hubbard wrote: >>> On 1/29/26 5:59 PM, Joel Fernandes wrote: >>>> On 1/29/26 8:12 PM, John Hubbard wrote: >>>>> On 1/29/26 4:26 PM, Joel Fernandes wrote: >>>>>> Based on the below discussion and research, I came up with some deadlock >>>>>> scenarios that we need to handle in the v6 series of these patches. >>>>>> [...] >>>>>> memory allocations under locks that we need in the dma-fence signaling >>>>>> critical path (when doing the virtual memory map/unmap) >>>>> >>>>> unmap? Are you seeing any allocations happening during unmap? I don't >>>>> immediately see any, but that sounds surprising. >>>> >>>> Not allocations but we are acquiring locks during unmap. My understanding >>>> is (at least some) unmaps have to also be done in the dma fence signaling >>>> critical path (the run stage), but Danilo/you can correct me if I am wrong >>>> on that. We cannot avoid all locking but those same locks cannot be held in >>>> any other paths which do a memory allocation (as mentioned in one of the >>>> deadlock scenarios), that is probably the main thing to check for unmap. >>>> >>> >>> Right, OK we are on the same page now: no allocations happening on unmap, >>> but it can still deadlock, because the driver is typically going to >>> use a single lock to protect calls both map and unmap-related calls >>> to the buddy allocator. >> >> Yes exactly! >> >>> >>> For the deadlock above, I think a good way to break that deadlock is >>> to not allow taking that lock in a fence signaling calling path. >>> >>> So during an unmap, instead of "lock, unmap/free, unlock" it should >>> move the item to a deferred-free list, which is processed separately. >>> Of course, this is a little complex, because the allocation and reclaim >>> has to be aware of such lists if they get large. >> Yes, also avoiding GFP_KERNEL allocations while holding any of these mm locks >> (whichever we take during map). The GPU buddy actually does GFP_KERNEL >> allocations internally which is problematic. >> >> Some solutions / next steps: >> >> 1. allocating (VRAM and system memory) outside mm locks just before >> acquiring them. >> >> 2. pre-allocating both VRAM and system memory needed, before the DMA fence >> critical paths (The issue is also to figure out how much memory to >> pre-allocate >> for the page table pages based on the VM_BIND request. I think we can analyze >> the page tables in the submit stage to make an estimate). >> >> 3. Unfortunately, I am using gpu-buddy when allocating a VA range in the Vmm >> (called virt_buddy), which itself does GFP_KERNEL memory allocations in the >> allocate path. I am not sure what do yet about this. ISTR the maple tree also >> has similar issues. >> >> 4. Using non-reclaimable memory allocations where pre-allocation or >> pre-allocated memory pools is not possible (I'd like to avoid this #4 so we >> don't fail allocations when memory is scarce). >> >> Will work on these issues for the v7. Thanks, > > The way this works on nouveau at least (and I haven't yet read the > nova code in depth). > > Is we have 4 stages of vmm page table mgmt. > > ref - locked with a ref lock - can allocate/free memory - just makes > sure the page tables exist and are reference counted > map - locked with a map lock - cannot allocate memory - fill in the > PTEs in the page table > unmap - locked with a map lock - cannot allocate memory - removes > entries in PTEs > unref - locked with a ref lock - can allocate/free memory - just drops > references and frees (not sure if it ever merges).
Thanks for sharing this, yes this is similar to what I am coming up with. One thing is OpenRM (and the Linux kernel) have finer grained locking. But I think we can keep it simple initially like we do for Nouveau with additional complexity progressively added. Joel Fernandes > > So maps and unmaps can be in fence signalling paths, but unrefs are > done in free job from a workqueue. > > Dave. >> >> -- >> Joel Fernandes >>
