On 04/11/16 09:55, Nipun Gupta wrote:
> The SMTNMB_TLBEN in the Auxiliary Configuration Register (ACR) provides an
> option to enable the updation of TLB in case of bypass transactions due to
> no stream match in the stream match table. This reduces the latencies of
> the subsequent transactions with the same stream-id which bypasses the SMMU.
> This provides a significant performance benefit for certain networking
> workloads.
> 
> With this change substantial performance improvement of ~9% is observed with
> DPDK l3fwd application 
> (http://dpdk.org/doc/guides/sample_app_ug/l3_forward.html)
> on NXP's LS2088a platform.

Reviewed-by: Robin Murphy <[email protected]>

> Signed-off-by: Nipun Gupta <[email protected]>
> ---
> Changes for v2:
>     - Incorporated Robin's comments on v1 related to
>       Setting SMTNMB_TLBEN in ACR only for MMU-500 as ACR is implementation 
> dependent
>       Code comments and Naming convention
> Changes for v3:
>     - Added correct patch version
> 
>  drivers/iommu/arm-smmu.c | 25 ++++++++++++++++---------
>  1 file changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index ce2a9d4..05901be 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -247,6 +247,7 @@ enum arm_smmu_s2cr_privcfg {
>  #define ARM_MMU500_ACTLR_CPRE                (1 << 1)
>  
>  #define ARM_MMU500_ACR_CACHE_LOCK    (1 << 26)
> +#define ARM_MMU500_ACR_SMTNMB_TLBEN  (1 << 8)
>  
>  #define CB_PAR_F                     (1 << 0)
>  
> @@ -1569,16 +1570,22 @@ static void arm_smmu_device_reset(struct 
> arm_smmu_device *smmu)
>       for (i = 0; i < smmu->num_mapping_groups; ++i)
>               arm_smmu_write_sme(smmu, i);
>  
> -     /*
> -      * Before clearing ARM_MMU500_ACTLR_CPRE, need to
> -      * clear CACHE_LOCK bit of ACR first. And, CACHE_LOCK
> -      * bit is only present in MMU-500r2 onwards.
> -      */
> -     reg = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID7);
> -     major = (reg >> ID7_MAJOR_SHIFT) & ID7_MAJOR_MASK;
> -     if ((smmu->model == ARM_MMU500) && (major >= 2)) {
> +     if (smmu->model == ARM_MMU500) {
> +             /*
> +              * Before clearing ARM_MMU500_ACTLR_CPRE, need to
> +              * clear CACHE_LOCK bit of ACR first. And, CACHE_LOCK
> +              * bit is only present in MMU-500r2 onwards.
> +              */
> +             reg = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID7);
> +             major = (reg >> ID7_MAJOR_SHIFT) & ID7_MAJOR_MASK;
>               reg = readl_relaxed(gr0_base + ARM_SMMU_GR0_sACR);
> -             reg &= ~ARM_MMU500_ACR_CACHE_LOCK;
> +             if (major >= 2)
> +                     reg &= ~ARM_MMU500_ACR_CACHE_LOCK;
> +             /*
> +              * Allow unmatched Stream IDs to allocate bypass
> +              * TLB entries for reduced latency.
> +              */
> +             reg |= ARM_MMU500_ACR_SMTNMB_TLBEN;
>               writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_sACR);
>       }
>  
> 

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to