Re: nouveau h/w acceleration is blacklisted by default in chromium 71+

Carsten Haitzler Wed, 09 Jan 2019 05:56:51 -0800

On 09/01/2019 11:54, Ard Biesheuvel wrote:
> On Wed, 9 Jan 2019 at 12:05, Will Deacon <will.dea...@arm.com> wrote:
>> On Wed, Jan 09, 2019 at 11:41:00AM +0100, Ard Biesheuvel wrote:
>>> (adding Will who was part of a similar discussion before)
>>>
>>> On Tue, 8 Jan 2019 at 19:35, Carsten Haitzler <carsten.haitz...@arm.com> 
>>> wrote:
>>>> On 08/01/2019 17:07, Grant Likely wrote:
>>>>
>>>> FYI I have a Radeon RX550 with amdgpu on my thunder-x2. yes - it's a
>>>> server ARM (aarch64) system, but it works a charm. 2 screens attached. I
>>>> did have to do the following:
>>>>
>>>> 1. patch kernel DRM code to force uncached mappings (the code apparently
>>>> assumes WC x86-style):
>>>>
>>>> --- ./include/drm/drm_cache.h~  2018-08-12 21:41:04.000000000 +0100
>>>> +++ ./include/drm/drm_cache.h   2018-11-16 11:06:16.976842816 +0000
>>>> @@ -48,7 +48,7 @@
>>>>  #elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3)
>>>>         return false;
>>>>  #else
>>>> -       return true;
>>>> +       return false;
>>>>  #endif
>>>>  }
>>>>
>>> OK, so this is rather interesting. First of all, this is the exact
>>> change we apply to the nouveau driver to work on SynQuacer, i.e.,
>>> demote all normal-non cacheable mappings of memory exposed by the PCIe
>>> controller via a BAR to device mappings. On SynQuacer, we need this
>>> because of a known silicon bug in the integration of the PCIe IP.
>>>
>>> However, the fact that even on TX2, you need device mappings to map
>>> RAM exposed via PCIe is rather troubling, and it has come up in the
>>> past as well. The problem is that the GPU driver stack on Linux,
>>> including VDPAU libraries and other userland pieces all assume that
>>> memory exposed via PCIe has proper memory semantics, including the
>>> ability to perform unaligned accesses on it or use DC ZVA instructions
>>> to clear it. As we all know, these driver stacks are rather complex,
>>> and adding awareness to each level in the stack regarding whether a
>>> certain piece of memory is real memory or PCI memory is going to be
>>> cumbersome.
>>>
>>> When we discussed this in the past, an ARM h/w engineer pointed out
>>> that normal-nc is fundamentally incompatible with AMBA or AXI or
>>> whatever we use on ARM to integrate these components at the silicon
>>> level.
>> FWIW, I still don't understand exactly what the point being made was in that
>> thread, but I do know that many of the assertions along the way were either
>> vague or incorrect. Yes, it's possible to integrate different buses in a way
>> that doesn't work, but I don't see anything "fundamental" about it.
>>
> OK
>
>>> If that means we can only use device mappings, it means we will
>>> need to make intrusive changes to a *lot* of code to ensure it doesn't
>>> use memcpy() or do other things that device mappings don't tolerate on
>>> ARM.
>> Even if we got it working, it would probably be horribly slow.
>>
>>> So, can we get the right people from the ARM side involved to clarify
>>> this once and for all?
>> Last time I looked at this code, the problem actually seemed to be that the
>> DRM core ends up trying to remap the CPU pages in ttm_set_pages_uc(). This
>> is a NOP for !x86, so I think we end up with the CPU using a cacheable
>> mapping but the device using a non-cacheable mapping, which could explain
>> the hang.
>>
>> At the time, implementing set_pages_uc() to remap the linear mapping wasn't
>> feasible because it would preclude the use of block mappings, but now that
>> we're using page mappings by default maybe you could give it a try.
>>
> OK, so I jumped to the conclusion that the similarity of the hack is
> due to the similarity of the issue it addresses, but this may not be
> the case.
>
> Carsten, could you please explain /why/ this first change is required?
> It appears to only affect GTT mappings, and not VRAM mappings, both of
> which are essentially PCIe memory, only the former may be backed by
> system memory under the hood.
>
> In any case, this is getting off-topic so I propose we trim the cc
> list a bit if we continue this discussion.
>
Well the only 2 drivers to use drm_arch_can_wc_memory() are radeon and
amdgpu. so we're limiting the affects to those. they use them to set up
memory mappings and flags. if you look at the code int he amdgpu driver
relevant to this:


        bo->flags = bp->flags;

#ifdef CONFIG_X86_32
        /* XXX: Write-combined CPU mappings of GTT seem broken on 32-bit
         * See https://bugs.freedesktop.org/show_bug.cgi?id=84627
         */
        bo->flags &= ~AMDGPU_GEM_CREATE_CPU_GTT_USWC;
#elif defined(CONFIG_X86) && !defined(CONFIG_X86_PAT)
        /* Don't try to enable write-combining when it can't work, or things
         * may be slow
         * See https://bugs.freedesktop.org/show_bug.cgi?id=88758
         */

#ifndef CONFIG_COMPILE_TEST
#warning Please enable CONFIG_MTRR and CONFIG_X86_PAT for better
performance \
         thanks to write-combining
#endif

        if (bo->flags & AMDGPU_GEM_CREATE_CPU_GTT_USWC)
                DRM_INFO_ONCE("Please enable CONFIG_MTRR and
CONFIG_X86_PAT for "
                              "better performance thanks to
write-combining\n");
        bo->flags &= ~AMDGPU_GEM_CREATE_CPU_GTT_USWC;
#else
        /* For architectures that don't support WC memory,
         * mask out the WC flag from the BO
         */
        if (!drm_arch_can_wc_memory())
                bo->flags &= ~AMDGPU_GEM_CREATE_CPU_GTT_USWC;
#endif

You can see that it's all about WC and even on certain x86 setups it's
not working so it gets turned off. Not so on ARM. My understanding is
that ARM can't do "WC" in a guaranteed way like x86, so turning it off
is the right thing to do anyway, and no one bothered to do it because no
one really tests AMD devices on ARM (much) or doesn't bother getting
patches back upstream. Yes - it only affects the GTT but that is totally
sufficient for things to go from totally broken (nothing even comes up
to totally working for me. I haven't looked but i assume it's using that
mapping for control registers or something because my issues were with
the command queue sequence numbers and so on not large volumes of data
like textures themselves etc.

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
_______________________________________________
cross-distro mailing list
cross-distro@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/cross-distro

Re: nouveau h/w acceleration is blacklisted by default in chromium 71+

Reply via email to