On Wed, Jan 09, 2019 at 11:41:00AM +0100, Ard Biesheuvel wrote: > (adding Will who was part of a similar discussion before) > > On Tue, 8 Jan 2019 at 19:35, Carsten Haitzler <carsten.haitz...@arm.com> > wrote: > > > > On 08/01/2019 17:07, Grant Likely wrote: > > > > FYI I have a Radeon RX550 with amdgpu on my thunder-x2. yes - it's a > > server ARM (aarch64) system, but it works a charm. 2 screens attached. I > > did have to do the following: > > > > 1. patch kernel DRM code to force uncached mappings (the code apparently > > assumes WC x86-style): > > > > --- ./include/drm/drm_cache.h~ 2018-08-12 21:41:04.000000000 +0100 > > +++ ./include/drm/drm_cache.h 2018-11-16 11:06:16.976842816 +0000 > > @@ -48,7 +48,7 @@ > > #elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3) > > return false; > > #else > > - return true; > > + return false; > > #endif > > } > > > > OK, so this is rather interesting. First of all, this is the exact > change we apply to the nouveau driver to work on SynQuacer, i.e., > demote all normal-non cacheable mappings of memory exposed by the PCIe > controller via a BAR to device mappings. On SynQuacer, we need this > because of a known silicon bug in the integration of the PCIe IP. > > However, the fact that even on TX2, you need device mappings to map > RAM exposed via PCIe is rather troubling, and it has come up in the > past as well. The problem is that the GPU driver stack on Linux, > including VDPAU libraries and other userland pieces all assume that > memory exposed via PCIe has proper memory semantics, including the > ability to perform unaligned accesses on it or use DC ZVA instructions > to clear it. As we all know, these driver stacks are rather complex, > and adding awareness to each level in the stack regarding whether a > certain piece of memory is real memory or PCI memory is going to be > cumbersome. > > When we discussed this in the past, an ARM h/w engineer pointed out > that normal-nc is fundamentally incompatible with AMBA or AXI or > whatever we use on ARM to integrate these components at the silicon > level.
FWIW, I still don't understand exactly what the point being made was in that thread, but I do know that many of the assertions along the way were either vague or incorrect. Yes, it's possible to integrate different buses in a way that doesn't work, but I don't see anything "fundamental" about it. > If that means we can only use device mappings, it means we will > need to make intrusive changes to a *lot* of code to ensure it doesn't > use memcpy() or do other things that device mappings don't tolerate on > ARM. Even if we got it working, it would probably be horribly slow. > So, can we get the right people from the ARM side involved to clarify > this once and for all? Last time I looked at this code, the problem actually seemed to be that the DRM core ends up trying to remap the CPU pages in ttm_set_pages_uc(). This is a NOP for !x86, so I think we end up with the CPU using a cacheable mapping but the device using a non-cacheable mapping, which could explain the hang. At the time, implementing set_pages_uc() to remap the linear mapping wasn't feasible because it would preclude the use of block mappings, but now that we're using page mappings by default maybe you could give it a try. Will _______________________________________________ cross-distro mailing list cross-distro@lists.linaro.org https://lists.linaro.org/mailman/listinfo/cross-distro