On 2020-09-16 18:46, Rob Herring wrote:
On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
<alyssa.rosenzw...@collabora.com> wrote:

So I get a performance regression with the dma-coherent approach, even if it's
clearly the cleaner.

That's bizarre -- this should really be the faster of the two.

Coherency may not be free. CortexA9 had something like 4x slower
memcpy if SMP was enabled as an example. I don't know if there's
anything going on like that specifically here. If there's never any
CPU accesses mixed in with kmscube, then there would be no benefit to

There will still be CPU benefits in terms of not having to spend time cache-cleaning every BO upon allocation, and less overhead on writing out descriptors in the first place (due to cacheable vs. non-cacheable).

I haven't tried the NSh hack on Juno, but I don't see any notable performance issue as-is - kmscube hits a solid 60FPS from the off (now that it works without spewing faults). Given that the hardware on Juno can be generously described as "less good", it would certainly be interesting to figure out what difference is at play here...

The usual argument against I/O coherency is that it adds latency to every access, but if you already have a coherent interconnect anyway then the sensible answer to that is implementing decent snoop filters, rather than making software more complicated.

iommu mailing list

Reply via email to