Hi Jerome,

sorry for being a bit late to the discussion and the top posting.

But I think you miss a very important point here, which makes the whole discussion on how to implement completely superfluous:

We already have a functionality to access the content of BOs in a process for debugging purpose which works similar as you described and just uses the BO housekeeping structures in the driver to access the pages and VRAM locations in question.

See here for the implementation:
1. drm/ttm: Implement vm_operations_struct.access v2 (http://www.spinics.net/lists/dri-devel/msg147302.html) 2. drm/amdgpu: Implement ttm_bo_driver.access_memory callback v2 (http://www.spinics.net/lists/dri-devel/msg147303.html)

Those patches allow you to just attach gdb to a process and access the content of every CPU mapped buffer, even when that buffer is in CPU invisible VRAM.

So the primary goal of that effort is *NOT* to make the BO content accessible to the debugger through the BO housekeeping, cause that's something we already have.

The goal here is to walk the different page tables and hardware mapping functionalities to access the data just in the same way the GPU would do to catch problems.

That works fine as long as IOMMU is disabled, but when it is enabled the additional mapping breaks our neck and we don't knew if the stuff the page table dumper is producing is correct or not.

So what we need is just a way to translate dma addresses back to struct pages to check their validity.

I've considered to just add this information to amdgpu_gem_info, but then we get page->dma-address mapping instead dma-address->page as we want.

Regards,
Christian.

Am 02.08.2017 um 06:42 schrieb Jerome Glisse:
On Tue, Aug 01, 2017 at 05:38:05PM -0400, Tom St Denis wrote:
On 01/08/17 05:01 PM, Jerome Glisse wrote:
Unless you can nop that in a config invariant fashion (like you can for
tracers) that's a NAK from the get go.  And we'd need to buffer them to be
practical since you might run the debugger out of sync with the application
(e.g. app hangs then you fire up umr to see what's going on).
Again when you start snooping you can access all the existing bo
informations and start monitoring event to keep userspace in sync
with any changes from the first snapshot you take.
You'd have to be able to do this post-mortem as well though.  And ideally I
don't really care about getting live updates (at least not yet though...).
The way it would work now is you read the ring and get IB pointers which you
then have to fetch and decode.  So all I care about now is what does the
pointer point to right now.

Though at this point I'm decoding a GPUVM address not a DMA nor physical
address so I then have to decode the GPUVM address "somehow" (assuming I
don't use my "unsafe" methods) then I can proceed to look up the buffer
based on the DMA address.
Everything is a GPU virtual address in first place. Then it is translated
to either GPU VRAM address or bus address. IIRC some of the DMA engine do
not use GPU VM but directly bus address and VRAM but this can also fit in
what i am proposing. See at the end.

(* being notified of changes might be useful later on.  We do have more
grandiose plans to integrate umr into gdb/etc functionality).

So you want to add something to the kernel just for a corner case ie
when debugging GPU hang. Not for more generic tools. I don't think
that the work needed for that is worth the effort just for such a
small usecase.
You can't really read GPU data when it's in flight anyways.  Ask a GFX
hardware person about reading registers when the GPU is running and they'll
say "NAK."

So even if the application hasn't crashed if you want to inspect things you
need to halt the waves (effectively halting the GPU).
I thought you were also doing a tracing/perf monitor tools so i wrongly
assume that you wanted thing to keep going.


     - GPU device memory is not necessary accessible (due to PCIE bar size
       limit i know that there is patches to allow to remap all GPU memory).
We have ways of accessing all of VRAM from basic MMIO accesses :)
Yes and it is insane to have to write VRAM address in one register and
read the VRAM value in another, insane from tedious point of view :)
I'd think double buffering **all** of VRAM in case you might want parts of
it to be more insane.
Not saying all vram only thing that is accessed think of it as temporary
bounce buffer of couple mega byte.

[...]

Except again you're looking at this from the lens that the KMD is working
perfectly.  We use umr (our debugger) to debug the kernel driver itself.
That means you need to be able to read/write MMIO registers, peek at VRAM,
decode VM addresses, read rings, etc...
And you can do that with the scheme i propose.
Not everything is a BO though ...
What isn't a bo ? Everything inside amdgpu driver is an amdgpu_bo unless i 
missed
something. I am ignoring amdkfd here as this one is easy.

[...]

I was saying that your design is restrictive and can not be use for other
thing and thus it is hard to justify the work for a single use case.
I'm not married to my trace idea.  It just happens to work for user
applications and is better than nothing (and is very unobtrusive).  If you
want to take a crack at a proper interface that accomplishes the same (and
more) by all means I'll gladly adopt using it.

Just keep in mind we have no plans to remove our existing debugfs
facilities.

Well maybe i will code it down latter today on the train back home.
Good luck Fermat.  :-)
So here it is

https://cgit.freedesktop.org/~glisse/linux/log/?h=amdgpu-debug

most of the plumbing is there, i am puzzle by the absence of an obvious
lock that protect the virtual address range from concurrent insert/remove.

I haven't finished the read method but what is missing is accessing ttm.tt
object if any or falling back to mmio read otherwise. Probably couple hundred
line of code to add this.

It is untested as i don't have hardware.

Jérôme


_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to