On Wed, May 29, 2019 at 01:55:48PM +0200, Marc Gonzalez wrote:
> From: Robin Murphy <[email protected]>
> 
> Apparently, some Qualcomm arm64 platforms which appear to expose their
> SMMU global register space are still, in fact, using a hypervisor to
> mediate it by trapping and emulating register accesses. Sadly, some
> deployed versions of said trapping code have bugs wherein they go
> horribly wrong for stores using r31 (i.e. XZR/WZR) as the source
> register.

^^^
This should be in the comment instead of "qcom bug".

> While this can be mitigated for GCC today by tweaking the constraints
> for the implementation of writel_relaxed(), to avoid any potential
> arms race with future compilers more aggressively optimising register
> allocation, the simple way is to just remove all the problematic
> constant zeros. For the write-only TLB operations, the actual value is
> irrelevant anyway and any old nearby variable will provide a suitable
> GPR to encode. The one point at which we really do need a zero to clear
> a context bank happens before any of the TLB maintenance where crashes
> have been reported, so is apparently not a problem... :/

Hmm. It would be nice to understand this a little better. In which cases
does XZR appear to work?

> Reported-by: AngeloGioacchino Del Regno <[email protected]>
> Reviewed-by: Marc Gonzalez <[email protected]>
> Reviewed-by: Bjorn Andersson <[email protected]>
> Tested-by: AngeloGioacchino Del Regno <[email protected]>
> Tested-by: Marc Gonzalez <[email protected]>
> Tested-by: Bjorn Andersson <[email protected]>
> Signed-off-by: Robin Murphy <[email protected]>
> ---
> Changes from v1:
> - Tweak commit message (remove "compilers", s/hangs/crashes)
> - Add a comment before writel_relaxed
> ---
>  drivers/iommu/arm-smmu.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 5e54cc0a28b3..3f352268fa8b 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -423,7 +423,8 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device 
> *smmu,
>  {
>       unsigned int spin_cnt, delay;
>  
> -     writel_relaxed(0, sync);
> +     /* Write "garbage" (rather than 0) to work around a qcom bug */
> +     writel_relaxed((unsigned long)sync, sync);
>       for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
>               for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
>                       if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
> @@ -1763,8 +1764,9 @@ static void arm_smmu_device_reset(struct 
> arm_smmu_device *smmu)
>       }
>  
>       /* Invalidate the TLB, just in case */
> -     writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> -     writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
> +     /* Write "garbage" (rather than 0) to work around a qcom bug */
> +     writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLH);
> +     writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);

Any reason not to make these obviously dummy values e.g.:

        /*
         * Text from the commit message about broken hypervisor
         */
        #define QCOM_DUMMY_VAL_NOT_XZR  ~0U

That makes the callsites much easier to understand and I doubt there's a
performance impact from allocating an extra register here.

Will
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to