>>> - When submitting commands to the GPU, the GPU driver will
>>> pm_runtime_get_sync() on the GPU device, which will automatically do
>>> the same on all the linked suppliers, which would also include the
>>> SMMU itself. The role of device links here is exactly that the GPU
>>> driver doesn't have to care which other devices need to be brought up.
>> This is true.  Assuming that the device link works correctly we would not 
>> need
>> to explicitly power the SMMU which makes my point entirely moot.
> Just to point out what motivated this patchset, the biggest problem is
> iommu_unmap() because that can happen when GPU is not powered on (or
> in the v4l2 case, because some other device dropped it's reference to
> the dma-buf allowing it to be free'd).  Currently we pm get/put the
> GPU device around unmap, but it is kinda silly to boot up the GPU just
> to unmap a buffer.

Note that in V4L2 both mapping and unmapping can happen completely
without involving the driver. So AFAICT the approach being implemented
by this patchset will not work, because there will be no one to power
up the IOMMU before the operation. Moreover, there are platforms for
which there is no reason to power up the IOMMU just for map/unmap,
because the hardware state is lost anyway and the only real work
needed is updating the page tables in memory. (I feel like this is
actually true for most of the platforms in the wild, but this is based
purely on the not so small number of platforms I worked with, haven't
bothered looking for more general evidence.)

> (Semi-related, I would also like to batch map/unmap's, I just haven't
> gotten around to implementing it yet.. but that would be another case
> where a single get_supplier()/put_supplier() outside of the iommu
> would make sense instead of pm_get/put() inside the iommu driver's
> ->unmap().)
> If you really dislike the get/put_supplier() approach, then perhaps we
> need iommu_pm_get()/iommu_pm_put() operations that the iommu user
> could use to accomplish the same thing?

I'm afraid this wouldn't work for V4L2 either. And I still haven't
been given any evidence that the approach I'm suggesting, which relies
only on existing pieces of infrastructure and which worked for both
Exynos and Rockchip, including V4L2, wouldn't work for SMMU and/or QC

