On Wed, 9 Jan 2019 at 16:30, Carsten Haitzler <carsten.haitz...@arm.com> wrote:
>
> On 09/01/2019 14:48, Ard Biesheuvel wrote:
> > On Wed, 9 Jan 2019 at 15:38, Carsten Haitzler <carsten.haitz...@arm.com> 
> > wrote:
> >> On 09/01/2019 14:36, Ard Biesheuvel wrote:
> >>
> >> On Wed, 9 Jan 2019 at 15:34, Carsten Haitzler <carsten.haitz...@arm.com> 
> >> wrote:
> >>
> >> On 09/01/2019 14:33, Bero Rosenkränzer wrote:
> >>
> >> On Wed, 9 Jan 2019 at 14:55, Carsten Haitzler <carsten.haitz...@arm.com> 
> >> wrote:
> >>
> >> My understanding is that ARM can't do "WC" in a guaranteed way like x86,
> >> so turning it off is the right thing to do anyway,
> >>
> >> My understanding too.
> >>
> >> FWIW I've added the fix to the OpenMandriva distro kernel
> >> https://github.com/OpenMandrivaSoftware/linux/commit/657041c5665c681d4519cf8297e0b8799b929f86
> >> Let's see if any user starts screaming ;)
> >>
> >> ttyl
> >> bero
> >>
> >> let's see,. i have put in a patch to the internal kernel patch review 
> >> before i send off to dri-devel. it's exactly your patch there just with a 
> >> commit log explaining why.
> >>
> >> So what exactly is it about x86 style wc that ARM cannot do?
> >>
> >> From Pavel Shamis here at ARM:
> >>
> >> "Short version.
> >>
> >> X86 has well define behavior for WC memory – it combines multiples 
> >> consecutive stores (has to be aligned to the cache line ) in 64B cache 
> >> line writes over PCIe.
> >>
> >> On Arm WC corresponds to Normal NC. Arm uarch does not do combining to 
> >> cache line size. On some uarch we do 16B combining but not cache line.
> >>
> >> The first uarch that will be doing cache line size combining is Aries.
> >>
> >>  It is important to note that WC is an opportunistic optimization and the 
> >> software/hardware should not make an assumption that it always “combines” 
> >> (true for x86 and arm)"
> >>
> > OK, so that only means that ARM WC mappings may behave more like x86
> > uncached mappings than x86 WC mappings. It does not explain why things
> > break if we use them.
> >
> > The problem with using uncached mappings here is that it breaks use
> > cases that expect memory semantics, for unaligned access or DC ZVA
> > instructions. At least VDPAU on nouveau breaks due to this, and likely
> > many more other use cases as well.
>
> For amdgpu though it works and this is and AMD+Radeon only code path. At
> least it works on the only ARM system I have an AMD GPU plugged into.
> you need the same fix for SynQuacer. Gettign a fix upstream like this
> will alleaviet a reasonable amount of pain for end-users even if not
> perfect.
>
> I do not plan on going any further with this patch because it's for my
> tx2 and that is my ONLY workstation at work and it takes like 10 minutes
> per reboot cycle. I have many things to do and getting my gfx card to a
> working state was the primary focus. Spending days just rebooting to try
> things with something I am not familiar with (thwe ttm mappings) is not
> something I have time for. Looking at the history of other bugs that
> affect WC/UC mappings in radeon/madgpu shows that this is precisely the
> kind of fix that has been done multiple times in the past for x86 and
> obviously some MIPS and PPC systems. there's mountains of precedent that
> this is a quick and simple fix that has been implemented many time in
> the past, so from that point of view I think its a decent fix in and of
> itself when it comes to time vs. reward.
>

I can confirm that this change fixes all the issues I observed on AMD
Seattle with HD5450 and HD7450 cards which use the Radeon driver (not
the amdpgu one)

So I will attempt to dig into this a bit further myself, and hopefully
find something that carries over to amdgpu as well, so I may ask you
to test something if I do.

> It may not be perfect, but it is better than it was and other MIPS/PPC
> and even x86 32bit systems already need this kind of fix. In the same
> way it seems ARM needs it too and no one to date has bothered upstream.
> I'd rather things improve for at least some set of people than they do
> not improve at all for an undefined amount of time. Note that working is
> an improvement to "fast but doesn't work" in my book. :) Don't get me
> wrong. Looking for a better fix in the meantime,if one could exist, is a
> positive thing. It's not something I can get stuck into as above.
>

I'd just like to see if we can fix properly before we upstream a hack.
_______________________________________________
cross-distro mailing list
cross-distro@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/cross-distro

Reply via email to