Hi Rob,

On 10/02/17 18:41, Rob Clark wrote:
> For devices with iommu(s) in secure mode, we cannot touch global
> registers, and we have to live with the context -> sid mapping that
> the secure world has set up for us.
> 
> This enables, for example db410c (apq8016) devices to use the up-
> stream arm-smmu driver.  This is the last major hurdle for having
> the gpu working out of the box on an upstream kernel.
> 
> NOTE: at this point, it works but I haven't spent any time thinking
> much about the bindings.  Since we can't read the global regs, we
> need to get all the device config info from DT (or at least anything
> that can't be hard-coded).

This approach seems, I have to say, unworkably horrible. I'm absolutely
against the idea of pretending we have access to global state which we
don't, then piling a facsimile of that state into the DT for no reason
other than to keep the overcomplicated pretence up. This configuration
comes out looking and behaving like a discrete IOMMU (see e.g.
rockchip-iommu for inspiration) - if we have to support it, it would
make far more sense to simply describe what we have, i.e. a
"qcom,apq8016-smmu-context-bank" with a single interrupt and
#iommu-cells = <0>. They'd want their own probe routine and private data
(AFAICS more or less just a base address, an IRQ, ID features and an
iommu_group), and probably a separate iommu_ops because you'd want to
handle {add,remove}_device() and device_group() significantly
differently. By the time we get up to {attach,remove}_dev() it might be
clean enough to dynamically handle both cases in the same code,
especially with a little general refactor of the ARM_SMMU_CB*
arithmetic, and once we get to operating on arm_smmu_domains there
should be no real difference.

> Also, this only works for non-secure contexts.  For secure contexts,
> even things like map/unmap need to go via scm, so that bypasses
> even more of arm-smmu.  I'm not sure if it is better to have all of
> those special paths added to arm-smmu.  Or have a 2nd iommu driver
> for the secure contexts.  (I think the second path only works since
> I don't think the CPU is really touching the hw at all for secure
> contexts.)
> 
> Not in lieu of bindings docs (which will come before this is more
> than just an RFC), but here is an example of what the DT bindings
> look like:
> 
>       gpu_iommu: qcom,iommu@1f00000 {
>               compatible = "qcom,smmu-v2";
>               reg = <0x1f00000 0x10000>;
> 
>               #global-interrupts = <1>;
>               interrupts =
>                       <GIC_SPI 43 0>,        // global
>                       <GIC_SPI 241 0>,       // unk0
>                       <GIC_SPI 241 0>,       // gfx3d_user
>                       <GIC_SPI 242 0>;       // gfx3d_priv
> 
>               qcom,stream-to-cb = <
>                       0x0002       // unk0
>                       0x0000       // gfx3d_user
>                       0x0001       // gfx3d_priv
>               >;
> 
>               #iommu-cells = <1>;
> 
>               clocks = <&gcc GCC_SMMU_CFG_CLK>,
>                        <&gcc GCC_GFX_TCU_CLK>;
>               clock-names = "smmu_iface_clk", "smmu_bus_clk";
> 
>               qcom,cb-count = <3>;
>               qcom,iommu-secure-id = <18>;
>               qcom,mapping-groups-count = <3>;
> 
>               status = "okay";
>       };
> 
> Since we must live with the assignment of stream-id's to context bank
> mapping that the secure world has set up for us, the "qcom,stream-to-cb"
> binding gives a mapping table of sid to cb.  (Indexed by cb, value is
> the sid.)  I'm not 100% sure what to do about devices with multiple
> sid's..  if I understand how things work properly, we could probably
> make the values in this table the result of OR'ing all the sids
> together.
> 
> The "qcom,cb-count" and "qcom,mapping-groups-count" can go away and
> be computed from "qcom,stream-to-cb".  I just haven't done that yet.
> The "qcom,iommu-secure-id" field is needed for the one scm call needed
> for non-secure contexts for initial configuration.
> 
> Anyways, at this point, I'm mostly just looking for feedback about
> whether this is the best way forward, vs introducing a seperate iommu
> driver, and any suggestions anyone might have.  And any ideas about how
> to best handle the secure context banks, since I think we have no
> choice but to use them for venus (the video enc/dec block).

I'd say the answer to how we handle secure contexts is "we don't".
Disregarding the apparent craziness of why the non-secure world would be
allowed direct control of an entirely secure device, at that point it's
not really an ARM SMMU any more, it's just *some*
firmware-paravirtualised IOMMU. If you can't touch the hardware, who
knows what's actually under there; "qcom-scm-iommu" is welcome to its
own driver.

Robin.

> Signed-off-by: Rob Clark <[email protected]>
> ---
>  drivers/iommu/arm-smmu.c | 233 
> +++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 217 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 938417e..7b7d05f 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -52,6 +52,9 @@
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
>  
> +#include <linux/qcom_scm.h>
> +
>  #include <linux/amba/bus.h>
>  
>  #include "io-pgtable.h"
> @@ -309,6 +312,7 @@ enum arm_smmu_implementation {
>       GENERIC_SMMU,
>       ARM_MMU500,
>       CAVIUM_SMMUV2,
> +     QCOM_SMMUV2,
>  };
>  
>  struct arm_smmu_s2cr {
> @@ -380,6 +384,11 @@ struct arm_smmu_device {
>       struct arm_smmu_s2cr            *s2crs;
>       struct mutex                    stream_map_mutex;
>  
> +     void __iomem                    *local_base; /* specific to qcom */
> +     u32                             *stream_map; /* specific to qcom */
> +     u32                             sec_id;      /* specific to qcom */
> +     DECLARE_BITMAP(context_init, ARM_SMMU_MAX_CBS); /* specific to qcom */
> +
>       unsigned long                   va_size;
>       unsigned long                   ipa_size;
>       unsigned long                   pa_size;
> @@ -599,6 +608,9 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device 
> *smmu)
>       int count = 0;
>       void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
>  
> +     if (smmu->model == QCOM_SMMUV2)
> +             return;
> +
>       writel_relaxed(0, gr0_base + ARM_SMMU_GR0_sTLBGSYNC);
>       while (readl_relaxed(gr0_base + ARM_SMMU_GR0_sTLBGSTATUS)
>              & sTLBGSTATUS_GSACTIVE) {
> @@ -739,19 +751,17 @@ static irqreturn_t arm_smmu_global_fault(int irq, void 
> *dev)
>       return IRQ_HANDLED;
>  }
>  
> -static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
> -                                    struct io_pgtable_cfg *pgtbl_cfg)
> +static void arm_smmu_init_context_bank_global(struct arm_smmu_domain 
> *smmu_domain,
> +                                           struct io_pgtable_cfg *pgtbl_cfg)
>  {
> -     u32 reg, reg2;
> -     u64 reg64;
> +     u32 reg;
>       bool stage1;
>       struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
>       struct arm_smmu_device *smmu = smmu_domain->smmu;
> -     void __iomem *cb_base, *gr1_base;
> +     void __iomem *gr1_base;
>  
>       gr1_base = ARM_SMMU_GR1(smmu);
>       stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS;
> -     cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx);
>  
>       if (smmu->version > ARM_SMMU_V1) {
>               if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
> @@ -782,6 +792,20 @@ static void arm_smmu_init_context_bank(struct 
> arm_smmu_domain *smmu_domain,
>               reg |= ARM_SMMU_CB_VMID(smmu, cfg) << CBAR_VMID_SHIFT;
>       }
>       writel_relaxed(reg, gr1_base + ARM_SMMU_GR1_CBAR(cfg->cbndx));
> +}
> +
> +static void arm_smmu_init_context_bank_ctx(struct arm_smmu_domain 
> *smmu_domain,
> +                                        struct io_pgtable_cfg *pgtbl_cfg)
> +{
> +     u32 reg, reg2;
> +     u64 reg64;
> +     bool stage1;
> +     struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> +     struct arm_smmu_device *smmu = smmu_domain->smmu;
> +     void __iomem *cb_base;
> +
> +     stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS;
> +     cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx);
>  
>       /* TTBRs */
>       if (stage1) {
> @@ -848,8 +872,41 @@ static void arm_smmu_init_context_bank(struct 
> arm_smmu_domain *smmu_domain,
>       writel_relaxed(reg, cb_base + ARM_SMMU_CB_SCTLR);
>  }
>  
> +static int qcom_smmu_find_context_bank(struct arm_smmu_device *smmu,
> +                                    struct iommu_fwspec *fwspec)
> +{
> +     int i;
> +
> +     /* TODO how to handle devices w/ multiple stream-ids?  I think
> +      * things are arranged such that I can OR all the sids together
> +      * and match that to an entry in stream-map??
> +      */
> +     u32 sid = fwspec->ids[0];
> +
> +     for (i = 0; i < smmu->num_context_banks; i++)
> +             if (smmu->stream_map[i] == sid)
> +                     return i;
> +
> +     dev_err(smmu->dev, "Failed to find qcom,stream-to-cb entry for 
> 0x%0x\n", sid);
> +
> +     return -ENXIO;
> +}
> +
> +static int qcom_smmu_sec_program_iommu(struct arm_smmu_device *smmu,
> +                                    struct arm_smmu_domain *smmu_domain)
> +{
> +     if (smmu->local_base) {
> +#define SMMU_INTR_SEL_NS     (0x2000)
> +             writel_relaxed(0xffffffff, smmu->local_base + SMMU_INTR_SEL_NS);
> +             mb();
> +     }
> +
> +     return qcom_scm_restore_sec_cfg(smmu->sec_id, smmu_domain->cfg.cbndx);
> +}
> +
>  static int arm_smmu_init_domain_context(struct iommu_domain *domain,
> -                                     struct arm_smmu_device *smmu)
> +                                     struct arm_smmu_device *smmu,
> +                                     struct iommu_fwspec *fwspec)
>  {
>       int irq, start, ret = 0;
>       unsigned long ias, oas;
> @@ -953,12 +1010,22 @@ static int arm_smmu_init_domain_context(struct 
> iommu_domain *domain,
>               goto out_unlock;
>       }
>  
> -     ret = __arm_smmu_alloc_bitmap(smmu->context_map, start,
> -                                   smmu->num_context_banks);
> -     if (ret < 0)
> -             goto out_unlock;
> +     if (smmu->model == QCOM_SMMUV2) {
> +             /* need to use sid->cb mapping that is already set up: */
> +             ret = qcom_smmu_find_context_bank(smmu, fwspec);
> +             if (ret < 0)
> +                     goto out_unlock;
> +
> +             cfg->cbndx = ret;
> +     } else {
> +             ret = __arm_smmu_alloc_bitmap(smmu->context_map, start,
> +                             smmu->num_context_banks);
> +             if (ret < 0)
> +                     goto out_unlock;
> +             cfg->cbndx = ret;
> +     }
> +
>  
> -     cfg->cbndx = ret;
>       if (smmu->version < ARM_SMMU_V2) {
>               cfg->irptndx = atomic_inc_return(&smmu->irptndx);
>               cfg->irptndx %= smmu->num_context_irqs;
> @@ -986,8 +1053,23 @@ static int arm_smmu_init_domain_context(struct 
> iommu_domain *domain,
>       domain->geometry.aperture_end = (1UL << ias) - 1;
>       domain->geometry.force_aperture = true;
>  
> -     /* Initialise the context bank with our page table cfg */
> -     arm_smmu_init_context_bank(smmu_domain, &pgtbl_cfg);
> +     if (smmu->model == QCOM_SMMUV2) {
> +             /* initializing the hw must happen only on first attach..
> +              * which we otherwise run afoul of since a _DMA domain
> +              * is automatically attached before drm/msm overrides
> +              * and attaches it's own domain.
> +              */
> +             if (!test_and_set_bit(cfg->cbndx, smmu->context_init)) {
> +                     ret = qcom_smmu_sec_program_iommu(smmu, smmu_domain);
> +                     if (ret)
> +                             goto out_unlock;
> +             }
> +     } else {
> +             /* Initialise the context bank with our page table cfg */
> +             arm_smmu_init_context_bank_global(smmu_domain, &pgtbl_cfg);
> +     }
> +
> +     arm_smmu_init_context_bank_ctx(smmu_domain, &pgtbl_cfg);
>  
>       /*
>        * Request context fault interrupt. Do this last to avoid the
> @@ -1107,6 +1189,8 @@ static void arm_smmu_write_s2cr(struct arm_smmu_device 
> *smmu, int idx)
>  
>  static void arm_smmu_write_sme(struct arm_smmu_device *smmu, int idx)
>  {
> +     if (smmu->model == QCOM_SMMUV2)
> +             return;
>       arm_smmu_write_s2cr(smmu, idx);
>       if (smmu->smrs)
>               arm_smmu_write_smr(smmu, idx);
> @@ -1288,6 +1372,10 @@ static int arm_smmu_domain_add_master(struct 
> arm_smmu_domain *smmu_domain,
>               s2cr[idx].type = type;
>               s2cr[idx].privcfg = S2CR_PRIVCFG_DEFAULT;
>               s2cr[idx].cbndx = cbndx;
> +
> +             if (smmu->model == QCOM_SMMUV2)
> +                     continue;
> +
>               arm_smmu_write_s2cr(smmu, idx);
>       }
>       return 0;
> @@ -1317,7 +1405,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
> *domain, struct device *dev)
>  
>       smmu = fwspec_smmu(fwspec);
>       /* Ensure that the domain is finalised */
> -     ret = arm_smmu_init_domain_context(domain, smmu);
> +     ret = arm_smmu_init_domain_context(domain, smmu, fwspec);
>       if (ret < 0)
>               return ret;
>  
> @@ -1682,6 +1770,9 @@ static void arm_smmu_device_reset(struct 
> arm_smmu_device *smmu)
>       int i;
>       u32 reg, major;
>  
> +     if (smmu->model == QCOM_SMMUV2)
> +             return;
> +
>       /* clear global FSR */
>       reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sGFSR);
>       writel(reg, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sGFSR);
> @@ -1764,6 +1855,103 @@ static void arm_smmu_device_reset(struct 
> arm_smmu_device *smmu)
>       writel(reg, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>  }
>  
> +/* for devices with qcom,smmu-v2, we cannot touch the global space,
> + * so get all the configuration from devicetree.
> + */
> +static int qcom_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
> +{
> +     struct device_node *np = smmu->dev->of_node;
> +     u32 size, cb_offset, i;
> +     int ret;
> +
> +     if (of_property_read_u32(np, "qcom,cb-count", 
> &smmu->num_context_banks)) {
> +             dev_err(smmu->dev, "missing qcom,cb-count property\n");
> +             return -ENODEV;
> +     }
> +
> +     dev_notice(smmu->dev, "\t%u context banks\n", smmu->num_context_banks);
> +
> +     // TODO get from dt?
> +     smmu->pgsize_bitmap = SZ_4K;
> +
> +     dev_notice(smmu->dev, "\tSupported page sizes: 0x%08lx\n",
> +                smmu->pgsize_bitmap);
> +
> +     if (of_property_read_u32(np, "qcom,mapping-groups-count", &size)) {
> +             dev_err(smmu->dev, "missing qcom,cb-count property\n");
> +             return -ENODEV;
> +     }
> +
> +     if (of_property_read_u32(np, "qcom,cb-base-offset", &cb_offset))
> +             cb_offset = 0x8000;
> +
> +     if (of_property_read_u32(np, "qcom,iommu-secure-id", &smmu->sec_id)) {
> +             dev_err(smmu->dev, "missing qcom,iommu-secure-id property\n");
> +             return -ENODEV;
> +     }
> +
> +     /*
> +      * Bit of a hack perhaps.. ARM_SMMU_CB_BASE() assumes that cb0 starts
> +      * half way through the IO region size.. not sure that is really true.
> +      */
> +     smmu->size = cb_offset << 1;
> +     smmu->pgshift = 12;
> +
> +     smmu->stream_map = devm_kmalloc_array(smmu->dev, 
> smmu->num_context_banks,
> +                                           sizeof(*smmu->stream_map),
> +                                           GFP_KERNEL);
> +     if (!smmu->stream_map)
> +             return -ENOMEM;
> +
> +     for (i = 0; i < smmu->num_context_banks; i++) {
> +             ret = of_property_read_u32_index(np, "qcom,stream-to-cb", i,
> +                                              &smmu->stream_map[i]);
> +             if (ret) {
> +                     dev_err(smmu->dev, "could not read qcom,stream-to-cb 
> idx %d\n", i);
> +                     return ret;
> +             }
> +     }
> +
> +     /*
> +      * The hardware does actually support stream-matching, but we cannot
> +      * touch the global space, so we have to live with how secure world
> +      * has set things up for us.  So don't set ARM_SMMU_FEAT_STREAM_MATCH
> +      * and leave smmu->smrs as NULL.  We just need to arrange that each
> +      * domain gets mapped to the correct context in s2crs.
> +      */
> +
> +     smmu->s2crs = devm_kmalloc_array(smmu->dev, size, sizeof(*smmu->s2crs),
> +                                      GFP_KERNEL);
> +     if (!smmu->s2crs)
> +             return -ENOMEM;
> +
> +     for (i = 0; i < size; i++)
> +             smmu->s2crs[i] = s2cr_init_val;
> +
> +     smmu->num_mapping_groups = size;
> +
> +     // *shrug*?
> +     smmu->streamid_mask = ~0;
> +     smmu->smr_mask_mask = ~0;
> +     smmu->features = ARM_SMMU_FEAT_TRANS_S1 | ARM_SMMU_FEAT_FMT_AARCH32_L;
> +
> +     /* TODO we might need to be able to use this on some 32b devices? */
> +     smmu->va_size  = 48;
> +     smmu->ipa_size = 48;
> +     smmu->pa_size  = 48;
> +
> +     // XXX deduplicate from arm_smmu_device_cfg_probe()
> +     if (smmu->features &
> +         (ARM_SMMU_FEAT_FMT_AARCH32_L | ARM_SMMU_FEAT_FMT_AARCH64_4K))
> +             smmu->pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G;
> +     if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_16K)
> +             smmu->pgsize_bitmap |= SZ_16K | SZ_32M;
> +     if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_64K)
> +             smmu->pgsize_bitmap |= SZ_64K | SZ_512M;
> +
> +     return 0;
> +}
> +
>  static int arm_smmu_id_size_to_bits(int size)
>  {
>       switch (size) {
> @@ -1833,6 +2021,11 @@ static int arm_smmu_device_cfg_probe(struct 
> arm_smmu_device *smmu)
>       dev_notice(smmu->dev, "SMMUv%d with:\n",
>                       smmu->version == ARM_SMMU_V2 ? 2 : 1);
>  
> +     mutex_init(&smmu->stream_map_mutex);
> +
> +     if (smmu->model == QCOM_SMMUV2)
> +             return qcom_smmu_device_cfg_probe(smmu);
> +
>       /* ID0 */
>       id = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID0);
>  
> @@ -1918,7 +2111,6 @@ static int arm_smmu_device_cfg_probe(struct 
> arm_smmu_device *smmu)
>               smmu->s2crs[i] = s2cr_init_val;
>  
>       smmu->num_mapping_groups = size;
> -     mutex_init(&smmu->stream_map_mutex);
>  
>       if (smmu->version < ARM_SMMU_V2 || !(id & ID0_PTFS_NO_AARCH32)) {
>               smmu->features |= ARM_SMMU_FEAT_FMT_AARCH32_L;
> @@ -2034,6 +2226,7 @@ static struct arm_smmu_match_data name = { .version = 
> ver, .model = imp }
>  
>  ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
>  ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
> +ARM_SMMU_MATCH_DATA(qcom_smmuv2, ARM_SMMU_V2, QCOM_SMMUV2);
>  ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU);
>  ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
>  ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
> @@ -2041,6 +2234,7 @@ ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, 
> CAVIUM_SMMUV2);
>  static const struct of_device_id arm_smmu_of_match[] = {
>       { .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
>       { .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
> +     { .compatible = "qcom,smmu-v2", .data = &qcom_smmuv2 },
>       { .compatible = "arm,mmu-400", .data = &smmu_generic_v1 },
>       { .compatible = "arm,mmu-401", .data = &arm_mmu401 },
>       { .compatible = "arm,mmu-500", .data = &arm_mmu500 },
> @@ -2172,6 +2366,13 @@ static int arm_smmu_device_probe(struct 
> platform_device *pdev)
>               return PTR_ERR(smmu->base);
>       smmu->size = resource_size(res);
>  
> +     if (smmu->model == QCOM_SMMUV2) {
> +             res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
> +                                                "smmu_local_base");
> +             if (res)
> +                     smmu->local_base = devm_ioremap_resource(dev, res);
> +     }
> +
>       num_irqs = 0;
>       while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, num_irqs))) {
>               num_irqs++;
> 

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to