Am 13.02.2018 um 17:56 schrieb Felix Kuehling:
Each process gets a whole page of the doorbell aperture assigned to it.
The assumption is that amdgpu only uses the first page of the doorbell
aperture, so KFD uses all the rest. On GFX8 and before, the queue ID is
used as the offset into the doorbell page. On GFX9 the hardware does
some engine-specific doorbell routing, so we added another layer of
doorbell management that's decoupled from the queue ID.
Either way, an entire doorbell page gets mapped into user mode and user
mode knows the offset of the doorbells for specific queues. The mapping
is currently handled by kfd_mmap in kfd_chardev.c.
Ok, wait a second. Taking a look at kfd_doorbell_mmap() it almost looks
like you map different doorbells with the same offset depending on which
process is calling this.
Is that correct? If yes then that would be illegal and a problem if I'm
not completely mistaken.
Do you simply assume that after evicting a process it always needs to
be restarted without checking if it actually does something? Or how
does that work?
Ok, understood. Well that limits the usefulness of the whole eviction
With later addition of GPU self-dispatch a page-fault based
mechanism wouldn't work any more. We have to restart the queues blindly
with a timer. See evict_process_worker, which schedules the restore with
a delayed worker.
which was send either by the GPU o
The user mode queue ABI specifies that user mode update both the
doorbell and a WPTR in memory. When we restart queues we (or the CP
firmware) use the WPTR to make sure we catch up with any work that was
submitted while the queues were unmapped.
Putting cross process work dispatch aside for a moment GPU self-dispatch
works only when there is work on the GPU running.
So you can still check if there are some work pending after you unmapped
everything and only restart the queues when there is new work based on
the page fault.
In other words either there is work pending and it doesn't matter if it
was send by the GPU or by the CPU or there is no work pending and we can
delay restarting everything until there is.
amd-gfx mailing list