Outside arm_smmu_cmdq_issue_cmd(), the only places we build a command
are those in which we are generating a sync for various reasons. Given
the circumstances, one might hope GCC to be clever enough to emit a
specialisation which avoids running through the switch statement with a
known constant opcode just to copy two dwords of mostly-static data, but
apparently it needs a little more help.

Explicitly marking arm_smmu_cmdq_build_cmd() as inline reduces these
sync special cases from an out-of-line call to a neat handful of ALU
instructions at the couple of relevant sites, yet squashing the full
switch statement into arm_smmu_cmdq_issue_cmd() somehow has a knock-on
effect across various other areas of the driver for a surprising overall
code size reduction:

   text    data     bss     dec     hex filename
  16951     648       8   17607    44c7 arm-smmu-v3.o.new
  17199     648       8   17855    45bf arm-smmu-v3.o.old

Signed-off-by: Robin Murphy <[email protected]>
---

Having distilled this out of my pile of hacks, I'm fairly confident that
it's a reasonable change. Nothing in the latest SVA branch seems to have
any adverse effect either, which is reassuring.

 drivers/iommu/arm-smmu-v3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 1d647104bccc..94544cd9d929 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -776,7 +776,7 @@ static int queue_remove_raw(struct arm_smmu_queue *q, u64 
*ent)
 }
 
 /* High-level queue accessors */
-static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
+static inline int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent 
*ent)
 {
        memset(cmd, 0, CMDQ_ENT_DWORDS << 3);
        cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
-- 
2.17.1.dirty

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to