Hi Nicolai,
yeah, that is a known issue.
You don't necessary need to add all fences from the PD to the released
BO, but immediately starting to clear the PTE would be a good idea.
amdgpu_gem_object_close() should call amdgpu_vm_clear_freed() if the
PD/PT are swapped in at that moment.
This leaves only a very small window where the application could access
freed up memory while the PTEs are cleared.
If we even want to close that one we could let amdgpu_vm_clear_freed()
return the fence of the clear operation and add that to the BO in question.
Regards,
Christian.
Am 22.03.2017 um 16:06 schrieb Nicolai Hähnle:
Hi all,
there's a bit of a puzzle where I'm wondering whether there's a subtle
bug in the amdgpu kernel module.
Basically, the concern is that a buggy user space driver might trigger
a sequence like this:
1. Submit a CS that accesses some BO _without_ adding that BO to the
buffer list.
2. Free that BO.
3. Some other task re-uses the memory underlying the BO.
4. The CS is submitted to the hardware and accesses memory that is now
already in use by somebody else, since there has been no update to the
page tables to reflect the freed BO.
Obviously there's a user space bug in step 1, but the kernel must
still prevent the conflicting memory accesses, and I don't see where
it does.
amdgpu_gem_object_close takes a reservation of the BO and the page
directory, but then simply backs off that reservation rather than
adding a fence, which I suspect is necessary.
I believe that whenever we remove a BO from a VM, we must
unconditionally add the most recent page directory fence(?) to the BO.
Does that sound right?
Cheers,
Nicolai
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx