On 05/10/2020 15:50, Boris Brezillon wrote:
On Tue, 22 Sep 2020 15:16:48 +0100
Robin Murphy <robin.mur...@arm.com> wrote:

Midgard GPUs have ACE-Lite master interfaces which allows systems to
integrate them in an I/O-coherent manner. It seems that from the GPU's
viewpoint, the rest of the system is its outer shareable domain, and so
even when snoop signals are wired up, they are only emitted for outer
shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
indeed get coherent pagetable walks working nicely for the coherent
T620 in the Arm Juno SoC.

Reviewed-by: Steven Price <steven.pr...@arm.com>
Tested-by: Neil Armstrong <narmstr...@baylibre.com>
Signed-off-by: Robin Murphy <robin.mur...@arm.com>
---
  drivers/iommu/io-pgtable-arm.c | 11 ++++++++++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index dc7bcf858b6d..b4072a18e45d 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -440,7 +440,13 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
                                << ARM_LPAE_PTE_ATTRINDX_SHIFT);
        }
- if (prot & IOMMU_CACHE)
+       /*
+        * Also Mali has its own notions of shareability wherein its Inner
+        * domain covers the cores within the GPU, and its Outer domain is
+        * "outside the GPU" (i.e. either the Inner or System domain in CPU
+        * terms, depending on coherency).
+        */
+       if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
                pte |= ARM_LPAE_PTE_SH_IS;
        else
                pte |= ARM_LPAE_PTE_SH_OS;

Actually, it still doesn't work on s922x :-/. For it to work I
correctly, I need to drop the outer shareable flag here.

The logic here does seem a bit odd. Originally it was:

IOMMU_CACHE -> Inner shared (value 3)
!IOMMU_CACHE -> Outer shared (value 2)

For Mali we're forcing everything to the second option. But Mali being Mali doesn't do things the same as LPAE, so for Mali we have:

0 - not shared
1 - reserved
2 - inner(*) and outer shareable
3 - inner shareable only

(*) where "inner" means internal to the GPU, and "outer" means shared with the CPU "inner". Very confusing!

So originally we had:
IOMMU_CACHE -> not shared with CPU (only internally in the GPU)
!IOMMU_CACHE -> shared with CPU

The change above gets us to "always shared", dropping the SH_OS bit would get us to not even shareable between cores (which doesn't sound like what we want).

It's not at all clear to me why the change helps, but I suspect we want at least "inner" shareable.

Steve

@@ -1049,6 +1055,9 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, 
void *cookie)
        cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) |
                                          ARM_MALI_LPAE_TTBR_READ_INNER |
                                          ARM_MALI_LPAE_TTBR_ADRMODE_TABLE;
+       if (cfg->coherent_walk)
+               cfg->arm_mali_lpae_cfg.transtab |= 
ARM_MALI_LPAE_TTBR_SHARE_OUTER;
+
        return &data->iop;
out_free_data:


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to