On Wed, Jan 09, 2019 at 11:41:00AM +0100, Ard Biesheuvel wrote:
> (adding Will who was part of a similar discussion before)
> 
> On Tue, 8 Jan 2019 at 19:35, Carsten Haitzler <carsten.haitz...@arm.com> 
> wrote:
> >
> > On 08/01/2019 17:07, Grant Likely wrote:
> >
> > FYI I have a Radeon RX550 with amdgpu on my thunder-x2. yes - it's a
> > server ARM (aarch64) system, but it works a charm. 2 screens attached. I
> > did have to do the following:
> >
> > 1. patch kernel DRM code to force uncached mappings (the code apparently
> > assumes WC x86-style):
> >
> > --- ./include/drm/drm_cache.h~  2018-08-12 21:41:04.000000000 +0100
> > +++ ./include/drm/drm_cache.h   2018-11-16 11:06:16.976842816 +0000
> > @@ -48,7 +48,7 @@
> >  #elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3)
> >         return false;
> >  #else
> > -       return true;
> > +       return false;
> >  #endif
> >  }
> >
> 
> OK, so this is rather interesting. First of all, this is the exact
> change we apply to the nouveau driver to work on SynQuacer, i.e.,
> demote all normal-non cacheable mappings of memory exposed by the PCIe
> controller via a BAR to device mappings. On SynQuacer, we need this
> because of a known silicon bug in the integration of the PCIe IP.
> 
> However, the fact that even on TX2, you need device mappings to map
> RAM exposed via PCIe is rather troubling, and it has come up in the
> past as well. The problem is that the GPU driver stack on Linux,
> including VDPAU libraries and other userland pieces all assume that
> memory exposed via PCIe has proper memory semantics, including the
> ability to perform unaligned accesses on it or use DC ZVA instructions
> to clear it. As we all know, these driver stacks are rather complex,
> and adding awareness to each level in the stack regarding whether a
> certain piece of memory is real memory or PCI memory is going to be
> cumbersome.
> 
> When we discussed this in the past, an ARM h/w engineer pointed out
> that normal-nc is fundamentally incompatible with AMBA or AXI or
> whatever we use on ARM to integrate these components at the silicon
> level.

FWIW, I still don't understand exactly what the point being made was in that
thread, but I do know that many of the assertions along the way were either
vague or incorrect. Yes, it's possible to integrate different buses in a way
that doesn't work, but I don't see anything "fundamental" about it.

> If that means we can only use device mappings, it means we will
> need to make intrusive changes to a *lot* of code to ensure it doesn't
> use memcpy() or do other things that device mappings don't tolerate on
> ARM.

Even if we got it working, it would probably be horribly slow.

> So, can we get the right people from the ARM side involved to clarify
> this once and for all?

Last time I looked at this code, the problem actually seemed to be that the
DRM core ends up trying to remap the CPU pages in ttm_set_pages_uc(). This
is a NOP for !x86, so I think we end up with the CPU using a cacheable
mapping but the device using a non-cacheable mapping, which could explain
the hang.

At the time, implementing set_pages_uc() to remap the linear mapping wasn't
feasible because it would preclude the use of block mappings, but now that
we're using page mappings by default maybe you could give it a try.

Will
_______________________________________________
cross-distro mailing list
cross-distro@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/cross-distro

Reply via email to