On 2020-09-16 18:46, Rob Herring wrote:
On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
So I get a performance regression with the dma-coherent approach, even if it's
clearly the cleaner.
That's bizarre -- this should really be the faster of the two.
Coherency may not be free. CortexA9 had something like 4x slower
memcpy if SMP was enabled as an example. I don't know if there's
anything going on like that specifically here. If there's never any
CPU accesses mixed in with kmscube, then there would be no benefit to
There will still be CPU benefits in terms of not having to spend time
cache-cleaning every BO upon allocation, and less overhead on writing
out descriptors in the first place (due to cacheable vs. non-cacheable).
I haven't tried the NSh hack on Juno, but I don't see any notable
performance issue as-is - kmscube hits a solid 60FPS from the off (now
that it works without spewing faults). Given that the hardware on Juno
can be generously described as "less good", it would certainly be
interesting to figure out what difference is at play here...
The usual argument against I/O coherency is that it adds latency to
every access, but if you already have a coherent interconnect anyway
then the sensible answer to that is implementing decent snoop filters,
rather than making software more complicated.
iommu mailing list