On 2018-02-13 12:06 PM, Christian König wrote:
> Am 13.02.2018 um 17:42 schrieb Felix Kuehling:
>> On 2018-02-13 05:25 AM, Christian König wrote:
>>> Am 13.02.2018 um 00:23 schrieb Felix Kuehling:
>>>> On 2018-02-12 11:57 AM, Felix Kuehling wrote:
>>>>>>> I just thought of a slightly different approach I would consider
>>>>>>> realistic, without having thought through all the details: Adding
>>>>>>> KFD-specific memory management ioctls to the amdgpu device.
>>>>>>> call amdgpu_amdkfd_gpuvm functions from amdgpu ioctl functions
>>>>>>> of KFD ioctl functions. But we'd still have KFD ioctls for other
>>>>>>> and the new amdgpu ioctls would be KFD-specific and not useful for
>>>>>>> graphics applications. It's probably still several weeks of
>>>>>>> work, but
>>>>>>> shouldn't have major teething issues because the internal logic and
>>>>>>> functionality would be basically unchanged. It would just move the
>>>>>>> ioctls from one device to another.
>>>>>> My thinking went into a similar direction. But instead of
>>>>>> exposing the
>>>>>> KFD IOCTLs through the DRM FD, I would let the KFD import a DRM FD.
>>>>>> And then use the DRM FD in the KFD for things like the buffer
>>>>>> of a device, e.g. no separate IDR for BOs in the KFD but rather a
>>>>>> reference to the DRM FD.
>>>>> I'm not sure I understand this. With "DRM FD" I think you mean the
>>>>> device file descriptor. Then have KFD call file operation on the
>>>>> FD to
>>>>> call AMDGPU ioctls? Is there any precedent for this that I can
>>>>> look at
>>>>> for an example?
>>>> OK, I think this could work for finding a VM from a DRM file
>>>> struct file *filp = fget(fd);
>>>> struct drm_file *drm_priv = filp->private_data;
>>>> struct amdgpu_fpriv *drv_priv = file_priv->driver_priv;
>>>> struct amdgpu_vm *vm = &drv_priv->vm;
>>>> That would let us use the DRM VM instead of creating our own and would
>>>> avoid wasting a perfectly good VM that gets created by opening the DRM
>>>> device. We'd need a new ioctl to import VMs into KFD. But that's about
>>>> as far as I would take it.
>>>> We'd still need to add KFD-specific data to the VM. If we use an
>>>> existing VM, we can't wrap it in our own structure any more. We'd need
>>>> to come up with a different solution or extend struct amdgpu_vm with
>>>> what we need.
>>> Well feel free to extend the VM structure with a pointer for KFD data.
>> Some more thoughts about that: Currently the lifetime of the VM is tied
>> to the file descriptor. If we import it into KFD, we either have to make
>> struct amdgpu_vm reference counted, or we have to keep a reference to
>> the file descriptor in KFD just to keep the VM alive until we drop our
>> reference to it.
>>>> And we still need our own BO creation ioctl. Importing DMABufs adds
>>>> extra overhead (more system calls, file descriptors, GEM objects) and
>>>> doesn't cover some of our use cases (userptr). We also need our own
>>>> for managing buffer handles that are used with all our memory and VM
>>>> management functions.
>>> Yeah, well that is the second part of that idea.
>>> When you have the drm_file for the VM, you can also use it to call
>> KFD only has one device, so we need a buffer ID that's globally unique,
>> not per-device. Or we'd need to identify buffers by a pair of a GEM
>> handle and a device ID. Our BO handles are 64-bit, so we could pack both
>> into a single handle.
> Ah, yeah that is also a point I wanted to to talk about with you.
> The approach of using the same buffer object with multiple amdgpu
> devices doesn't work in general.
> We need separate TTM object for each BO in each device or otherwise we
> break A+A laptops.
I think it broke for VRAM BOs because we enabled P2P on systems that
didn't support it properly. But at least system memory BOs can be shared
quite well between devices and we do it all the time. I don't see how
you can have separate TTM objects referring to the same memory.
> That is also the reason we had to disable this feature again in the
> hybrid branches.
What you disabled on hybrid branches was P2P, which only affects
large-BAR systems. Sharing of system memory BOs between devices still
> So we need BO handles per device here or some other solution.
>> Either way, that still requires a GEM object, which adds not just a
>> minor bit of overhead. It includes a struct file * which points to a
>> shared memory file that gets created for each BO in drm_gem_object_init.
> HUI WHAT? Why the heck do we still do this?
> Sorry something went wrong here and yes that shmen file is completely
> superfluous even for the gfx side.
>>> And when you need to iterate over all the BOs you can just use the
>>> object_idr or the VM structure as well.
>> object_idr only gives us BOs allocated from a particular device. The VM
>> structure has various BO lists, but they only contain BOs that were
>> added to the VM. The same BO can be added multiple times or to multiple
>> VMs, or it may not be in any VM at all.
> Ok, yeah that probably isn't sufficient for the BO handling like you
> need it for eviction.
>> In later patch series we also need to track additional information for
>> KFD buffers in KFD itself. At that point we change the KFD IDR to point
>> to our own structure, which in turn points to the BO. For example our
>> BOs get their VA assigned at allocation time, and we later do reverse
>> lookups from the address to the BO using an interval tree.
> You can do this with the standard VM structure as well, that is needed
> for UVD and VCE anyway. See amdgpu_vm_bo_lookup_mapping.
Then we need to know which VM to search. If all we have is a pointer,
we'd have to try potentially all VMs.
amd-gfx mailing list