On 2018/8/16 17:27, Robin Murphy wrote:
> On 2018-08-16 10:18 AM, Will Deacon wrote:
>> On Thu, Aug 16, 2018 at 04:21:17PM +0800, Leizhen (ThunderTown) wrote:
>>> On 2018/8/15 20:26, Robin Murphy wrote:
>>>> On 15/08/18 11:23, Zhen Lei wrote:
>>>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>>>> index 1d64710..3f5c236 100644
>>>>> --- a/drivers/iommu/arm-smmu-v3.c
>>>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>>>> @@ -566,7 +566,7 @@ struct arm_smmu_device {
>>>>>
>>>>> int gerr_irq;
>>>>> int combined_irq;
>>>>> - atomic_t sync_nr;
>>>>> + u32 sync_nr;
>>>>>
>>>>> unsigned long ias; /* IPA */
>>>>> unsigned long oas; /* PA */
>>>>> @@ -775,6 +775,11 @@ static int queue_remove_raw(struct arm_smmu_queue
>>>>> *q, u64 *ent)
>>>>> return 0;
>>>>> }
>>>>>
>>>>> +static inline void arm_smmu_cmdq_sync_set_msidata(u64 *cmd, u32 msidata)
>>>>
>>>> If we *are* going to go down this route then I think it would make sense
>>>> to move the msiaddr and CMDQ_SYNC_0_CS_MSI logic here as well; i.e.
>>>> arm_smmu_cmdq_build_cmd() always generates a "normal" SEV-based sync
>>>> command, then calling this guy would convert it to an MSI-based one.
>>>> As-is, having bits of mutually-dependent data handled across two
>>>> separate places just seems too messy and error-prone.
>>>
>>> Yes, How about create a new function "arm_smmu_cmdq_build_sync_msi_cmd"?
>>>
>>> static inline
>>> void arm_smmu_cmdq_build_sync_msi_cmd(u64 *cmd, struct arm_smmu_cmdq_ent
>>> *ent)
>>> {
>>> cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode);
>>> cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_IRQ);
>>> cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSH, ARM_SMMU_SH_ISH);
>>> cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIATTR, ARM_SMMU_MEMATTR_OIWB);
miss: cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIDATA, ent->sync.msidata);
>>> cmd[1] = ent->sync.msiaddr & CMDQ_SYNC_1_MSIADDR_MASK;
>>> }
>>
>> None of this seems justified given the numbers from John, so please just do
>> the simple thing and build the command with the lock held.
In order to observe the optimization effect, I conducted 5 tests for each
case. Although the test result is volatility, but we can still get which case
is good or bad. It accords with our theoretical analysis.
Test command: fio -numjobs=8 -rw=randread -runtime=30 ... -bs=4k
Test Result: IOPS, for example: read : io=86790MB, bw=2892.1MB/s, iops=740586,
runt= 30001msec
Case 1: (without these patches)
675480
672055
665275
648610
661146
Case 2: (move arm_smmu_cmdq_build_cmd into lock)
688714
697355
632951
700540
678459
Case 3: (base on case 2, replace arm_smmu_cmdq_build_cmd with
arm_smmu_cmdq_build_sync_msi_cmd)
721582
729226
689574
679710
727770
Case 4: (base on case 3, plus patch 2)
734077
742868
738194
682544
740586
Case 2 is better than case 1, I think the main reason is the
atomic_inc_return_relaxed(&smmu->sync_nr)
has been removed. Case 3 is better than case 2, because the assembly code is
reduced, see below.
>
> Agreed - sorry if my wording was unclear, but that suggestion was only for
> the possibility of it proving genuinely worthwhile to build the command
> outside the lock. Since that isn't the case, I definitely prefer the simpler
> approach too.
Yes, I mean replace arm_smmu_cmdq_build_cmd with
arm_smmu_cmdq_build_sync_msi_cmd to build the command inside the lock.
spin_lock_irqsave(&smmu->cmdq.lock, flags);
+ ent.sync.msidata = ++smmu->sync_nr;
+ arm_smmu_cmdq_build_sync_msi_cmd(cmd, &ent);
arm_smmu_cmdq_insert_cmd(smmu, cmd);
spin_unlock_irqrestore(&smmu->cmdq.lock, flags);
The assembly code showed me that it's very nice.
ffff0000085e6928: 94123207 bl ffff000008a73144
<_raw_spin_lock_irqsave>
ffff0000085e692c: b9410ad5 ldr w21, [x22,#264]
ffff0000085e6930: d28208c2 mov x2, #0x1046
// #4166
ffff0000085e6934: aa0003fa mov x26, x0
ffff0000085e6938: 110006b5 add w21, w21, #0x1
ffff0000085e693c: f2a1f802 movk x2, #0xfc0, lsl #16
ffff0000085e6940: aa1603e0 mov x0, x22
ffff0000085e6944: 910163a1 add x1, x29, #0x58
ffff0000085e6948: aa158042 orr x2, x2, x21, lsl #32
ffff0000085e694c: b9010ad5 str w21, [x22,#264]
ffff0000085e6950: f9002fa2 str x2, [x29,#88]
ffff0000085e6954: d2994016 mov x22, #0xca00
// #51712
ffff0000085e6958: f90033b3 str x19, [x29,#96]
ffff0000085e695c: 97fffd5b bl ffff0000085e5ec8
<arm_smmu_cmdq_insert_cmd>
ffff0000085e6960: aa1903e0 mov x0, x25
ffff0000085e6964: aa1a03e1 mov x1, x26
ffff0000085e6968: f2a77356 movk x22, #0x3b9a, lsl #16
ffff0000085e696c: 94123145 bl ffff000008a72e80
<_raw_spin_unlock_irqrestore>
>
> Robin.
>
> .
>
--
Thanks!
BestRegards
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu