On 02/11/16 19:26, Nipun Gupta wrote: > > Hi Robin, > >> -----Original Message----- >> From: Robin Murphy [mailto:[email protected]] >> Sent: Wednesday, November 02, 2016 16:51 >> To: Nipun Gupta <[email protected]>; [email protected]; linux-arm- >> [email protected]; [email protected] >> Cc: Stuart Yoder <[email protected]> >> Subject: Re: [PATCH] iommu: arm-smmu: Set SMTNMB_TLBEN in ACR to enable >> caching of bypass entries >> >> Hi Nipun, >> >> On 02/11/16 13:35, Nipun Gupta wrote: >>> The SMTNMB_TLBEN in the Auxiliary Configuration Register (ACR) >>> provides an option to enable the updation of TLB in case of bypass >>> transactions due to no stream match in the stream match table. This >>> reduces the latencies of the subsequent transactions with the same stream-id >> which bypasses the SMMU. >>> This provides a significant performance benefit for certain networking >>> workloads. >> >> ...at the cost of increased TLB contention against other workloads. >> However, in the general case we'd expect the system to be fully described, >> so if >> there aren't any unmatched Stream IDs there hopefully shouldn't be an impact >> to leaving this switched on. I'd be interested to see some actual performance >> numbers, though - you already know my opinion about unsubstantiated quotes >> from the MMU-500 TRM. > > With this change we have seen substantial performance improvement of ~9-10% > with DPDK l3fwd application > (http://dpdk.org/doc/guides/sample_app_ug/l3_forward.html) > on NXP's LS2088a platform (single core as well as multi-core). Also, with ODP > reflector application > (loopback mode - NXP in-house) we have seen 5% improvement in performance on > LS1088 platform. > > W.r.t. the read latencies, they are reduced to avg. ~50 platform cycles from > avg. ~140 > platform cycles per memory read transactions which follow this bypass path > (on LS2088 > with DPDK l3fwd application). > > (Apologies, I cannot share the DPDK/ODP exact performance numbers on the > mailing list).
That's understandable, and I'm not sure I'd know how to interpret them anyway ;) I reckon the percentages make a sufficiently compelling qualification of the improvement, so it would be good to have that summarised in the commit log. >> >>> Signed-off-by: Nipun Gupta <[email protected]> >>> --- >>> drivers/iommu/arm-smmu.c | 21 +++++++++++++++------ >>> 1 file changed, 15 insertions(+), 6 deletions(-) >>> >>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index >>> ce2a9d4..7010a5c 100644 >>> --- a/drivers/iommu/arm-smmu.c >>> +++ b/drivers/iommu/arm-smmu.c >>> @@ -246,6 +246,7 @@ enum arm_smmu_s2cr_privcfg { >>> >>> #define ARM_MMU500_ACTLR_CPRE (1 << 1) >>> >>> +#define ACR_SMTNMB_TLBEN (1 << 8) >> >> ACR is entirely implementation-defined, so there are no generic field names. >> Please follow the naming convention handily demonstrated in the subsequent >> context line. >> >>> #define ARM_MMU500_ACR_CACHE_LOCK (1 << 26) >> >> Actually, can we also please keep these in descending order of bit position >> like >> everything else? >> >>> #define CB_PAR_F (1 << 0) >>> @@ -1569,18 +1570,26 @@ static void arm_smmu_device_reset(struct >> arm_smmu_device *smmu) >>> for (i = 0; i < smmu->num_mapping_groups; ++i) >>> arm_smmu_write_sme(smmu, i); >>> >>> + /* Get the major rev required for configuring ACR */ >> >> That comment is nonsense. >> >>> + reg = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID7); >>> + major = (reg >> ID7_MAJOR_SHIFT) & ID7_MAJOR_MASK; >>> + >>> /* >>> * Before clearing ARM_MMU500_ACTLR_CPRE, need to >>> * clear CACHE_LOCK bit of ACR first. And, CACHE_LOCK >>> * bit is only present in MMU-500r2 onwards. >>> */ >>> - reg = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID7); >>> - major = (reg >> ID7_MAJOR_SHIFT) & ID7_MAJOR_MASK; >>> - if ((smmu->model == ARM_MMU500) && (major >= 2)) { >>> - reg = readl_relaxed(gr0_base + ARM_SMMU_GR0_sACR); >>> + reg = readl_relaxed(gr0_base + ARM_SMMU_GR0_sACR); >>> + if ((smmu->model == ARM_MMU500) && (major >= 2)) >>> reg &= ~ARM_MMU500_ACR_CACHE_LOCK; >>> - writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_sACR); >>> - } >>> + >>> + /* >>> + * Set the SMTNMB_TLBEN in ACR so that the transactions which >>> + * bypass with SMMU due to no stream match found in the SMR table >>> + * are updated in the TLB's. >> >> Or simply, e.g. "Allow unmatched Stream IDs to allocate bypass TLB entries >> for >> reduced latency". It's already clear from the code what bit's being set >> where, we >> only need to remember *why*. >> >>> + */ >>> + reg |= ACR_SMTNMB_TLBEN; >>> + writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_sACR); >> >> Are you sure it's perfectly fine to set that implementation-defined bit on >> any >> SMMU implementation other than the two-and-a-half ARM Ltd. ones which >> happen to share the same meaning? I'm certainly not. >> >> The correct flow would be something like this: >> >> if (smmu->model == ARM_MMU500) { >> reg = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID7); >> major = (reg >> ID7_MAJOR_SHIFT) & ID7_MAJOR_MASK; >> reg = readl_relaxed(gr0_base + ARM_SMMU_GR0_sACR); >> if (major >= 2) >> reg &= ~ARM_MMU500_ACR_CACHE_LOCK; >> reg |= ACR_SMTNMB_TLBEN; >> writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_sACR); >> } >> >> The shape of the current code avoids an extra level of indentation, but once >> you >> have to have the nested conditional anyway, it might as well all be >> predicated >> appropriately. >> > > Thank you for providing the useful comments. I would incorporate them all in > next version :). Cool. Just for clarity (I realise I was thinking it, but never said it outright), whilst MMU-40x do share the same feature with the same ACR bit definition as MMU-500, I'd be inclined not to bother with them. Since the monolithic microarchitecture means there's normally a separate MMU-40x per device, if you don't want translation for that device you can simply not probe the thing to turn it on in the first place. Robin. > > Regards, > Nipun > >> Robin. >> >>> /* Make sure all context banks are disabled and clear CB_FSR */ >>> for (i = 0; i < smmu->num_context_banks; ++i) { >>> > _______________________________________________ iommu mailing list [email protected] https://lists.linuxfoundation.org/mailman/listinfo/iommu
