On 09/07/2019 09:25, Pratyush Yadav wrote:
>>> +static int queue_insert_raw(struct arm_smmu_queue *q, __u64 *ent)
>>> +{
>>> + while (queue_full(q))
>>> + {}
>>
>> It seems like queue_full() only accesses the shadow version of cons and
>> prod, so I think you need a queue_sync_cons() in this loop. But given that
>> you drain the queue for each command below, queue_full() could only happen
>> if the queue is in error state, in which case this will loop indefinitely.
>
> I'll sync the consumer in the body of the loop, just in case there is a
> change in the future that doesn't drain the queue.
>
> Also, why would queue_full() happen when the queue is in error state? In
> that function, we don't check the error code, only the write index and
> wrap. AFAIK, neither of those are affected when an error is active.
>
> So we should also add queue_error() check here, correct?
Yes I think so, and return early if queue_error(). If you keep adding new
commands while the the queue is in error state, then the SMMU never
updates cons and you'll end up being stuck with queue_full() in the above
loop, at some point.
>>> +
>>> + queue_write(queue_entry(q), ent, q->ent_dwords);
>>> + queue_inc_prod(q);
>>> + while (!queue_empty(q) && !queue_error(q)) {
>>> + queue_sync_cons(q);
>>> + }
>>> + return 0;
>>> +}
[...]
>>> + /*
>>> + * Commands are written little-endian, but we want the SMMU to
>>> + * receive MSIData, and thus write it back to memory, in CPU
>>> + * byte order, so big-endian needs an extra byteswap here.
>>> + */
>>
>> The comment doesn't apply anymore, does it?
>
> Why not? I'm guessing it's because the driver is written with the
> assumption it is running on a little endian core.
>
> In that case, yes, it does not apply. This was taken verbatim from the
> Linux driver, so probably why this comment stayed around.
Yes I think the comment applied to cpu_to_le32(ent->sync.msidata) in the
Linux driver
>
>>> + cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIDATA, ent->sync.msidata);
>>> + cmd[1] |= ent->sync.msiaddr & CMDQ_SYNC_1_MSIADDR_MASK;
>>> + break;
>>> + default:
>>> + return -ENOENT;
>>> + }
>>> +
>>> + return 0;
>>> +}
[...]
>>> +static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32
>>> sid)
>>> +{
>>> + struct arm_smmu_cmdq_ent cmd = {
>>> + .opcode = CMDQ_OP_CFGI_STE,
>>> + .cfgi = {
>>> + .sid = sid,
>>> + .leaf = true,
>>
>> You probably need leaf = false, because the SMMU could cache l1 stream
>> table descriptors. I think the Linux driver gets away with it because it
>> only modifies l1 descriptors before making the stream table visible to the
>> SMMU through SMMU_STRTAB_BASE (I haven't verified that claim).
>
> Hm, I don't think the Linux driver only modifies L1 descriptors before
> making the stream table visible to the SMMU.
>
> arm_smmu_add_device() calls arm_smmu_init_l2_strtab() which modifies the
> L1 descriptor. Since a device can be added after the SMMU is
> initialised, the Linux driver makes changes to L1 descriptors after the
> SMMU is up and the stream table is visible.
Good point, I'll look at fixing this in the Linux driver
>
> Either way, we should be invalidating the L1 descriptors when we
> actually update them. And the best place for that would be
> arm_smmu_write_strtab_l1_desc().
>
> Unfortunately, I don't think there is a command to invalidate just a L1
> descriptor. So we'll probably have to hack around it by invalidating the
> first STE for the L1 descriptor with leaf = false.
>
> Any better ideas?
No that's what I do when I want to invalidate a L1 context descriptor,
invalidate the first L2 descriptor with leaf = false.
>
>>> + },
>>> + };
>>> +
>>> + arm_smmu_cmdq_issue_cmd(smmu, &cmd);
>>> + arm_smmu_cmdq_issue_sync(smmu);
>>> +}
>>> +
>>> +static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32
>>> sid,
>>> + __u64 *guest_ste, __u64 *dst,
>>> + bool bypass, u32 vmid)
>>> +{
>>> + struct paging_structures *pg_structs = &this_cell()->arch.mm;
>>> + u64 val = dst[0], vttbr, mask;
>>> +
>>> + /* Bypass */
>>> + if (bypass) {
>>> + dst[0] = STRTAB_STE_0_V;
>>
>> Hm, won't the "dst[0] = val" below clear V?
>
> Yes it will. This probably got messed up in a bit of last-minute
> refactoring I did. Fixed.
>
>> STE writes in this function might need a little more care. At this point
>> for example, the SMMU could issue C_BAD_STE events because the STE is
>> marked valid but is in inconsistent state. It might be best to do the same
>> as the Linux driver: ensure dst[0].V is clear, write dst[1-3] first, issue
>> a sync(), then write dst[0] in a single access.
>
> All right. The next patch also touches this function. That will need
> fixing too.
>
>>> + dsb(ish);
>>> + val = FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
>>> + dst[0] = val;
>>> + dst[1] = FIELD_PREP(STRTAB_STE_1_SHCFG,
>>> + STRTAB_STE_1_SHCFG_INCOMING);
>>> + dst[2] = vmid;
>>> + dsb(ish);
>>> + if (smmu) {
>>> + arm_smmu_sync_ste_for_sid(smmu, sid);
>>> + }
>>> + return;
>>> + }
>>> +
>>> + dst[2] = FIELD_PREP(STRTAB_STE_2_S2VMID, vmid) |
>>
>> One more restriction on the hardware is that the VMID size used by the CPU
>> must be smaller or equal the size of the VMID supported by the SMMU. If
>> the CPU is using 16-bit VMID (TCR.VS == 1) but SMMU_IDR0.VMID16 == 0, then
>> setting the top VMID bits here is illegal.
>
> We're pulling the VMID from cell->config->id, which is the cell id. This
> is a linearly allocated number, so you'd need at least 257 cells to set
> any of the top bits. In case of Jailhouse, that would mean at least 257
> CPUs. I don't think anyone is using Jailhouse with 256+ CPUs, so this is
> not a big problem for now.
Fair enough. Note that the ThunderX2 has 256 CPUs and an SMMUv3, so I
wouldn't be surprised to see 256+ in a few years. But such hardware is
likely to set VMID16 anyway.
> And say we do overflow. Say we do have more than 257 cells running. What
> do you set the VMID to then? You can't wrap the value back because that
> would conflict with another cell. The only course then would be to
> terminate the cell, or not use the SMMU for it at all.
It might still be a good idea to check that we have enough VMID bits here
and display an error if we don't.
>
> Not using the SMMU could work, but it means there would be complications
> when the cell expects stage 1 translations.
>
> Again, to me it looks like a problem that is not likely to occur. Jan,
> your thoughts about it?
>
>>> + FIELD_PREP(STRTAB_STE_2_VTCR, VTCR_CELL) |
>>> + STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 |
>>> + STRTAB_STE_2_S2R;
>>
>>> +
>>> + vttbr = paging_hvirt2phys(pg_structs->root_table);
>>> + dst[3] = vttbr & STRTAB_STE_3_S2TTB_MASK;
>>> +
>>> + dst[0] |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
>>> + dst[0] |= STRTAB_STE_0_V;
>>
>> Here as well, it would be better to sync the rest of the STE, then write
>> dst[0] in a single access.
>
> Will fix.
>
>>> + dsb(ish);
>>
>> Not sure that's needed, the Linux driver doesn't have a dsb() here
>
> Why not? Do you not want to make sure the writes actually go through
> before you sync the STE?
Ah, I think in Linux we rely on the writel(cmdq.prod). It has a dsb(st)
which ensures that all writes are visible to the SMMU before we publish
the new command. If you added a dsb() in queue_inc_prod() I think you
could remove the other dsb() calls in ths patch.
>
>>> +
>>> + arm_smmu_sync_ste_for_sid(smmu, sid);
>>> +}
>> [...]
>>
>>> +static void arm_smmu_uninit_l2_strtab(struct arm_smmu_device *smmu, u32
>>> sid)
>>> +{
>>> + struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
>>> + struct arm_smmu_strtab_l1_desc *desc;
>>> + void *strtab;
>>> + u32 size;
>>> +
>>> + desc = &cfg->l1_desc[sid >> STRTAB_SPLIT];
>>> +
>>> + desc->active_stes--;
>>> + if (desc->active_stes)
>>> + return;
>>> +
>>> + size = 1 << (STRTAB_SPLIT + STRTAB_STE_DWORDS_BITS + 3);
>>> + page_free(&mem_pool, desc->l2ptr, PAGES(size));
>>
>> Might be safer to free the page after clearing the l1_desc and issuing a
>> CMDQ_CFGI_STE? Here the SMMU can still access the page.
>
> This function is only called during cell_exit(). Will the SMMU have any
> transactions while a cell is exiting?
The SMMU is allowed to access structures speculatively, as long as they
are reacheable (pointed to by valid descriptors). See 3.21.3 in the
specification:
"An implementation is permitted to fetch or prefetch any reachable
structure at any time, as long as the generated address lies within the
bounds of the table containing the structure. An implementation is
permitted to cache any successfully fetched or prefetched configuration
structure, whether marked as valid or not, in its entirety or partially."
So there is a possible use-after-free. Even though I think it's harmless
enough, it's still a good idea to only free the table after the invalidate.
>
>>> + desc->l2ptr = NULL;
>>> + desc->l2ptr_dma = 0;
>>> + desc->span = 0;
>>> + strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS];
>>> + arm_smmu_write_strtab_l1_desc(strtab, desc);
>>> +
>>> + return;
>>
>> nit: not needed
>
> Fixed.
>
>> [...]
>>> +static int arm_smmuv3_cell_init(struct cell *cell)
>>> +{
>>> + struct jailhouse_iommu *iommu;
>>> + struct arm_smmu_cmdq_ent cmd;
>>> + int ret, i, j, sid;
>>> +
>>> + for (i = 0; i < JAILHOUSE_MAX_IOMMU_UNITS; i++) {
>>> + iommu = &system_config->platform_info.arm.iommu_units[i];
>>> + if (iommu->type != JAILHOUSE_IOMMU_SMMUV3)
>>> + continue;
>>> +
>>> + for_each_stream_id(sid, cell->config, j) {
>>> + ret = arm_smmu_init_ste(&smmu[i], sid,
>>> cell->config->id);
>>> + if (ret) {arch_paging_flush_page_tlbs
>>> + printk("ERROR: SMMU INIT ste failed: sid =
>>> %d\n",
>>> + sid);
>>> + return ret;
>>> + }
>>> + }
>>> + }
>>> +
>>> + cmd.opcode = CMDQ_OP_TLBI_S12_VMALL;
>>> + cmd.tlbi.vmid = cell->config->id;
>>> + arm_smmu_cmdq_issue_cmd(smmu, &cmd);
>>> + arm_smmu_cmdq_issue_sync(smmu);
>>
>> By the way, shouldn't there be a CMDQ_OP_TLBI_S2_IPA whenever we do
>> arch_paging_flush_page_tlbs() on the CPU side?
>
> I don't think so. Firstly, arch_paging_flush_page_tlbs() is called only
> for changes to hypervisor paging structs. Right now, the hypervisor does
> not use the SMMU at all. It is only for guest use.
>
> And the guest memory mappings are static and will remain constant from
> start till exit. So we should only need to flush these entries when
> starting a cell (which we do), and when exiting a cell.
Ok
> This makes me realise, we should be calling CMDQ_OP_TLBI_S12_VMALL on
> cell_exit() too.
Yes I think that would be better.
Thanks,
Jean
--
You received this message because you are subscribed to the Google Groups
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/jailhouse-dev/19cd101c-13c8-c008-539d-ca7dfa959ac2%40gmail.com.
For more options, visit https://groups.google.com/d/optout.