On 2021-08-17 12:34, Zhen Lei wrote:
Although the parameter 'cmd' is always passed by a local array variable,
and only this function modifies it, the compiler does not know this. The
compiler almost always reads the value of cmd[i] from memory rather than
directly using the value cached in the register. This generates many
useless instruction operations and affects the performance to some extent.
Which compiler? GCC 4.9 does not make the same codegen decisions that
GCC 10 does; Clang is different again. There are also various config
options which affect a compiler's inlining/optimisation choices either
directly or indirectly.
If it's something that newer compilers can get right anyway, then
micro-optimising just for older ones might warrant a bit more justification.
To guide the compiler for proper optimization, 'cmd' is defined as a local
array variable, marked as register, and copied to the output parameter at
a time when the function is returned.
The optimization effect can be viewed by running the "size arm-smmu-v3.o"
command.
Before:
text data bss dec hex
27602 1348 56 29006 714e
After:
text data bss dec hex
27402 1348 56 28806 7086
Signed-off-by: Zhen Lei <thunder.leiz...@huawei.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index d76bbbde558b776..50a9db5bac466c7 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -233,11 +233,19 @@ static int queue_remove_raw(struct arm_smmu_queue *q, u64
*ent)
return 0;
}
+#define arm_smmu_cmdq_copy_cmd(dst, src) \
+ do { \
+ dst[0] = src[0]; \
+ dst[1] = src[1]; \
+ } while (0)
+
/* High-level queue accessors */
-static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
+static int arm_smmu_cmdq_build_cmd(u64 *out_cmd, struct arm_smmu_cmdq_ent *ent)
{
- memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
- cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
+ register u64 cmd[CMDQ_ENT_DWORDS];
+
+ cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
+ cmd[1] = 0;
switch (ent->opcode) {
case CMDQ_OP_TLBI_EL2_ALL:
@@ -309,6 +317,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct
arm_smmu_cmdq_ent *ent)
case PRI_RESP_SUCC:
break;
default:
+ arm_smmu_cmdq_copy_cmd(out_cmd, cmd);
Why bother writing back a partial command when we're telling the caller
it's invalid anyway?
return -EINVAL;
}
cmd[1] |= FIELD_PREP(CMDQ_PRI_1_RESP, ent->pri.resp);
@@ -329,9 +338,12 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct
arm_smmu_cmdq_ent *ent)
cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIATTR,
ARM_SMMU_MEMATTR_OIWB);
break;
default:
+ arm_smmu_cmdq_copy_cmd(out_cmd, cmd);
Ditto.
return -ENOENT;
}
+ arm_smmu_cmdq_copy_cmd(out_cmd, cmd);
...and then it would be simpler to open-code the assignment here.
I guess if you're really concerned with avoiding temporary commands
being written back to the stack and reloaded, it might be worth
experimenting with wrapping them in a struct which can be passed around
by value - AAPCS64 allows passing a 16-byte composite type purely in
registers.
Robin.
+
return 0;
}
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu