On Wed, 9 Jan 2019 at 16:30, Carsten Haitzler <carsten.haitz...@arm.com> wrote: > > On 09/01/2019 14:48, Ard Biesheuvel wrote: > > On Wed, 9 Jan 2019 at 15:38, Carsten Haitzler <carsten.haitz...@arm.com> > > wrote: > >> On 09/01/2019 14:36, Ard Biesheuvel wrote: > >> > >> On Wed, 9 Jan 2019 at 15:34, Carsten Haitzler <carsten.haitz...@arm.com> > >> wrote: > >> > >> On 09/01/2019 14:33, Bero Rosenkränzer wrote: > >> > >> On Wed, 9 Jan 2019 at 14:55, Carsten Haitzler <carsten.haitz...@arm.com> > >> wrote: > >> > >> My understanding is that ARM can't do "WC" in a guaranteed way like x86, > >> so turning it off is the right thing to do anyway, > >> > >> My understanding too. > >> > >> FWIW I've added the fix to the OpenMandriva distro kernel > >> https://github.com/OpenMandrivaSoftware/linux/commit/657041c5665c681d4519cf8297e0b8799b929f86 > >> Let's see if any user starts screaming ;) > >> > >> ttyl > >> bero > >> > >> let's see,. i have put in a patch to the internal kernel patch review > >> before i send off to dri-devel. it's exactly your patch there just with a > >> commit log explaining why. > >> > >> So what exactly is it about x86 style wc that ARM cannot do? > >> > >> From Pavel Shamis here at ARM: > >> > >> "Short version. > >> > >> X86 has well define behavior for WC memory – it combines multiples > >> consecutive stores (has to be aligned to the cache line ) in 64B cache > >> line writes over PCIe. > >> > >> On Arm WC corresponds to Normal NC. Arm uarch does not do combining to > >> cache line size. On some uarch we do 16B combining but not cache line. > >> > >> The first uarch that will be doing cache line size combining is Aries. > >> > >> It is important to note that WC is an opportunistic optimization and the > >> software/hardware should not make an assumption that it always “combines” > >> (true for x86 and arm)" > >> > > OK, so that only means that ARM WC mappings may behave more like x86 > > uncached mappings than x86 WC mappings. It does not explain why things > > break if we use them. > > > > The problem with using uncached mappings here is that it breaks use > > cases that expect memory semantics, for unaligned access or DC ZVA > > instructions. At least VDPAU on nouveau breaks due to this, and likely > > many more other use cases as well. > > For amdgpu though it works and this is and AMD+Radeon only code path. At > least it works on the only ARM system I have an AMD GPU plugged into. > you need the same fix for SynQuacer. Gettign a fix upstream like this > will alleaviet a reasonable amount of pain for end-users even if not > perfect. > > I do not plan on going any further with this patch because it's for my > tx2 and that is my ONLY workstation at work and it takes like 10 minutes > per reboot cycle. I have many things to do and getting my gfx card to a > working state was the primary focus. Spending days just rebooting to try > things with something I am not familiar with (thwe ttm mappings) is not > something I have time for. Looking at the history of other bugs that > affect WC/UC mappings in radeon/madgpu shows that this is precisely the > kind of fix that has been done multiple times in the past for x86 and > obviously some MIPS and PPC systems. there's mountains of precedent that > this is a quick and simple fix that has been implemented many time in > the past, so from that point of view I think its a decent fix in and of > itself when it comes to time vs. reward. >
I can confirm that this change fixes all the issues I observed on AMD Seattle with HD5450 and HD7450 cards which use the Radeon driver (not the amdpgu one) So I will attempt to dig into this a bit further myself, and hopefully find something that carries over to amdgpu as well, so I may ask you to test something if I do. > It may not be perfect, but it is better than it was and other MIPS/PPC > and even x86 32bit systems already need this kind of fix. In the same > way it seems ARM needs it too and no one to date has bothered upstream. > I'd rather things improve for at least some set of people than they do > not improve at all for an undefined amount of time. Note that working is > an improvement to "fast but doesn't work" in my book. :) Don't get me > wrong. Looking for a better fix in the meantime,if one could exist, is a > positive thing. It's not something I can get stuck into as above. > I'd just like to see if we can fix properly before we upstream a hack. _______________________________________________ cross-distro mailing list cross-distro@lists.linaro.org https://lists.linaro.org/mailman/listinfo/cross-distro