On Fri, Feb 11, 2022 at 12:15:44AM +0100, Janne Grunau wrote:
> On 2022-02-09 17:31:16 +0100, Thierry Reding wrote:
> > On Sun, Feb 06, 2022 at 11:27:00PM +0100, Janne Grunau wrote:
> > > On 2021-09-15 17:19:39 +0200, Thierry Reding wrote:
> > > > On Tue, Sep 07, 2021 at 07:44:44PM +0200, Thierry Reding wrote:
> > > > > On Tue, Sep 07, 2021 at 10:33:24AM -0500, Rob Herring wrote:
> > > > > > On Fri, Sep 3, 2021 at 10:36 AM Thierry Reding 
> > > > > > <thierry.red...@gmail.com> wrote:
> > > > > > >
> > > > > > > On Fri, Sep 03, 2021 at 09:36:33AM -0500, Rob Herring wrote:
> > > > > > > > On Fri, Sep 3, 2021 at 8:52 AM Thierry Reding 
> > > > > > > > <thierry.red...@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Sep 03, 2021 at 08:20:55AM -0500, Rob Herring wrote:
> > > > > > > > > >
> > > > > > > > > > Couldn't we keep this all in /reserved-memory? Just add an 
> > > > > > > > > > iova
> > > > > > > > > > version of reg. Perhaps abuse 'assigned-address' for this 
> > > > > > > > > > purpose. The
> > > > > > > > > > issue I see would be handling reserved iova areas without a 
> > > > > > > > > > physical
> > > > > > > > > > area. That can be handled with just a iova and no reg. We 
> > > > > > > > > > already have
> > > > > > > > > > a no reg case.
> > > > > > > > >
> > > > > > > > > I had thought about that initially. One thing I'm worried 
> > > > > > > > > about is that
> > > > > > > > > every child node in /reserved-memory will effectively cause 
> > > > > > > > > the memory
> > > > > > > > > that it described to be reserved. But we don't want that for 
> > > > > > > > > regions
> > > > > > > > > that are "virtual only" (i.e. IOMMU reservations).
> > > > > > > >
> > > > > > > > By virtual only, you mean no physical mapping, just a region of
> > > > > > > > virtual space, right? For that we'd have no 'reg' and therefore 
> > > > > > > > no
> > > > > > > > (physical) reservation by the OS. It's similar to non-static 
> > > > > > > > regions.
> > > > > > > > You need a specific handler for them. We'd probably want a 
> > > > > > > > compatible
> > > > > > > > as well for these virtual reservations.
> > > > > > >
> > > > > > > Yeah, these would be purely used for reserving regions in the 
> > > > > > > IOVA so
> > > > > > > that they won't be used by the IOVA allocator. Typically these 
> > > > > > > would be
> > > > > > > used for cases where those addresses have some special meaning.
> > > > > > >
> > > > > > > Do we want something like:
> > > > > > >
> > > > > > >         compatible = "iommu-reserved";
> > > > > > >
> > > > > > > for these? Or would that need to be:
> > > > > > >
> > > > > > >         compatible = "linux,iommu-reserved";
> > > > > > >
> > > > > > > ? There seems to be a mix of vendor-prefix vs. non-vendor-prefix
> > > > > > > compatible strings in the reserved-memory DT bindings directory.
> > > > > > 
> > > > > > I would not use 'linux,' here.
> > > > > > 
> > > > > > >
> > > > > > > On the other hand, do we actually need the compatible string? 
> > > > > > > Because we
> > > > > > > don't really want to associate much extra information with this 
> > > > > > > like we
> > > > > > > do for example with "shared-dma-pool". The logic to handle this 
> > > > > > > would
> > > > > > > all be within the IOMMU framework. All we really need is for the
> > > > > > > standard reservation code to skip nodes that don't have a reg 
> > > > > > > property
> > > > > > > so we don't reserve memory for "virtual-only" allocations.
> > > > > > 
> > > > > > It doesn't hurt to have one and I can imagine we might want to 
> > > > > > iterate
> > > > > > over all the nodes. It's slightly easier and more common to iterate
> > > > > > over compatible nodes rather than nodes with some property.
> > > > > > 
> > > > > > > > Are these being global in DT going to be a problem? Presumably 
> > > > > > > > we have
> > > > > > > > a virtual space per IOMMU. We'd know which IOMMU based on a 
> > > > > > > > device's
> > > > > > > > 'iommus' and 'memory-region' properties, but within 
> > > > > > > > /reserved-memory
> > > > > > > > we wouldn't be able to distinguish overlapping addresses from 
> > > > > > > > separate
> > > > > > > > address spaces. Or we could have 2 different IOVAs for 1 
> > > > > > > > physical
> > > > > > > > space. That could be solved with something like this:
> > > > > > > >
> > > > > > > > iommu-addresses = <&iommu1 <address cells> <size cells>>;
> > > > > > >
> > > > > > > The only case that would be problematic would be if we have 
> > > > > > > overlapping
> > > > > > > physical regions, because that will probably trip up the standard 
> > > > > > > code.
> > > > > > >
> > > > > > > But this could also be worked around by looking at 
> > > > > > > iommu-addresses. For
> > > > > > > example, if we had something like this:
> > > > > > >
> > > > > > >         reserved-memory {
> > > > > > >                 fb_dc0: fb@80000000 {
> > > > > > >                         reg = <0x80000000 0x01000000>;
> > > > > > >                         iommu-addresses = <0xa0000000 0x01000000>;
> > > > > > >                 };
> > > > > > >
> > > > > > >                 fb_dc1: fb@80000000 {
> > > > > > 
> > > > > > You can't have 2 nodes with the same name (actually, you can, they
> > > > > > just get merged together). Different names with the same 
> > > > > > unit-address
> > > > > > is a dtc warning. I'd really like to make that a full blown
> > > > > > overlapping region check.
> > > > > 
> > > > > Right... so this would be a lot easier to deal with using that earlier
> > > > > proposal where the IOMMU regions were a separate thing and referencing
> > > > > the reserved-memory nodes. In those cases we could just have the
> > > > > physical reservation for the framebuffer once (so we don't get any
> > > > > duplicates or overlaps) and then have each IOVA reservation reference
> > > > > that to create the mapping.
> > > > > 
> > > > > > 
> > > > > > >                         reg = <0x80000000 0x01000000>;
> > > > > > >                         iommu-addresses = <0xb0000000 0x01000000>;
> > > > > > >                 };
> > > > > > >         };
> > > > > > >
> > > > > > > We could make the code identify that this is for the same physical
> > > > > > > reservation (maybe make it so that reg needs to match exactly for 
> > > > > > > this
> > > > > > > to be recognized) but with different virtual allocations.
> > > > > > >
> > > > > > > On a side-note: do we really need to repeat the size? I'd think 
> > > > > > > if we
> > > > > > > want mappings then we'd likely want them for the whole 
> > > > > > > reservation.
> > > > > > 
> > > > > > Humm, I suppose not, but dropping it paints us into a corner if we
> > > > > > come up with wanting a different size later. You could have a 
> > > > > > carveout
> > > > > > for double/triple buffering your framebuffer, but the bootloader
> > > > > > framebuffer is only single buffered. So would you want actual size?
> > > > > 
> > > > > Perhaps this needs to be a bit more verbose then. If we want the 
> > > > > ability
> > > > > to create a mapping for only a partial reservation, I could imagine we
> > > > > may as well want one that doesn't start at the beginning. So perhaps 
> > > > > an
> > > > > ever better solution would be to have a complete mapping, something 
> > > > > that
> > > > > works similar to "ranges" perhaps, like so:
> > > > > 
> > > > >       fb@80000000 {
> > > > >               reg = <0x80000000 0x01000000>;
> > > > >               iommu-ranges = <0x80000000 0x01000000 0x80000000>;
> > > > >       };
> > > > > 
> > > > > That would be for a full identity mapping, but we could also have
> > > > > something along the lines of this:
> > > > > 
> > > > >       fb@80000000 {
> > > > >               reg = <0x80000000 0x01000000>;
> > > > >               iommu-ranges = <0x80100000 0x00100000 0xa0000000>;
> > > > >       };
> > > > > 
> > > > > So that would only map a 1 MiB chunk at offset 1 MiB (of the physical
> > > > > reservation) to I/O virtual address 0xa0000000.
> > > > > 
> > > > > > > I'd like to keep references to IOMMUs out of this because they 
> > > > > > > would be
> > > > > > > duplicated. We will only use these nodes if they are referenced 
> > > > > > > by a
> > > > > > > device node that also has an iommus property. Also, the IOMMU 
> > > > > > > reference
> > > > > > > itself isn't enough. We'd also need to support the complete 
> > > > > > > specifier
> > > > > > > because you can have things like SIDs in there to specify the 
> > > > > > > exact
> > > > > > > address space that a device uses.
> > > > > > >
> > > > > > > Also, for some of these they may be reused independently of the 
> > > > > > > IOMMU
> > > > > > > address space. For example the Tegra framebuffer identity mapping 
> > > > > > > can
> > > > > > > be used by either of the 2-4 display controllers, each with (at 
> > > > > > > least
> > > > > > > potentially) their own address space. But we don't want to have to
> > > > > > > describe the identity mapping separately for each display 
> > > > > > > controller.
> > > > > > 
> > > > > > Okay, but I'd rather have to duplicate things in your case than not 
> > > > > > be
> > > > > > able to express some other case.
> > > > > 
> > > > > The earlier "separate iov-reserved-memory" proposal would be a good
> > > > > compromise here. It'd allow us to duplicate only the necessary bits
> > > > > (i.e. the IOVA mappings) but keep the common bits simple. And even
> > > > > the IOVA mappings could be shared for cases like identity mappings.
> > > > > See below for more on that.
> > > > > 
> > > > > > > Another thing to consider is that these nodes will often be added 
> > > > > > > by
> > > > > > > firmware (e.g. firmware will allocate the framebuffer and set up 
> > > > > > > the
> > > > > > > corresponding reserved memory region in DT). Wiring up references 
> > > > > > > like
> > > > > > > this would get very complicated very quickly.
> > > > > > 
> > > > > > Yes.
> > > > > > 
> > > > > > The using 'iommus' property option below can be optional and doesn't
> > > > > > have to be defined/supported now. Just trying to think ahead and not
> > > > > > be stuck with something that can't be extended.
> > > > > 
> > > > > One other benefit of the separate iov-reserved-memory node would be 
> > > > > that
> > > > > the iommus property could be simplified. If we have a physical
> > > > > reservation that needs to be accessed by multiple different display
> > > > > controllers, we'd end up with something fairly complex, such as this:
> > > > > 
> > > > >       fb: fb@80000000 {
> > > > >               reg = <0x80000000 0x01000000>;
> > > > >               iommus = <&dc0_iommu 0xa0000000 0x01000000>,
> > > > >                        <&dc1_iommu 0xb0000000 0x01000000>,
> > > > >                        <&dc2_iommu 0xc0000000 0x01000000>;
> > > > >       };
> > > > > 
> > > > > This would get even worse if we want to support partial mappings. 
> > > > > Also,
> > > > > it'd become quite complicated to correlate this with the memory-region
> > > > > references:
> > > > > 
> > > > >       dc0: dc@40000000 {
> > > > >               ...
> > > > >               memory-region = <&fb>;
> > > > >               iommus = <&dc0_iommu>;
> > > > >               ...
> > > > >       };
> > > > > 
> > > > > So now you have to go match up the phandle (and potentially specifier)
> > > > > in the iommus property of the disp0 node with an entry in the fb 
> > > > > node's
> > > > > iommus property. That's all fairly complicated stuff.
> > > > > 
> > > > > With separate iov-reserved-memory, this would be a bit more verbose, 
> > > > > but
> > > > > each individual node would be simpler:
> > > > > 
> > > > >       reserved-memory {
> > > > >               fb: fb@80000000 {
> > > > >                       reg = <0x80000000 0x01000000>;
> > > > >               };
> > > > >       };
> > > > > 
> > > > >       iov-reserved-memory {
> > > > >               fb0: fb@80000000 {
> > > > >                       /* identity mapping, "reg" optional? */
> > > > >                       reg = <0x80000000 0x01000000>;
> > > > >                       memory-region = <&fb>;
> > > > >               };
> > > > > 
> > > > >               fb1: fb@90000000 {
> > > > >                       /* but doesn't have to be */
> > > > >                       reg = <0x90000000 0x01000000>;
> > > > >                       memory-region = <&fb>;
> > > > >               };
> > > > > 
> > > > >               fb2: fb@a0000000 {
> > > > >                       /* can be partial, too */
> > > > >                       ranges = <0x80000000 0x00800000 0xa0000000>;
> > > > >                       memory-region = <&fb>;
> > > > >               };
> > > > >       }
> > > > > 
> > > > >       dc0: dc@40000000 {
> > > > >               iov-memory-regions = <&fb0>;
> > > > >               /* optional? */
> > > > >               memory-region = <&fb>;
> > > > >               iommus = <&dc0_iommu>;
> > > > >       };
> > > > > 
> > > > > Alternatively, if we want to support partial mappings, we could 
> > > > > replace
> > > > > those reg properties by ranges properties that I showed earlier. We 
> > > > > may
> > > > > even want to support both. Use "reg" for virtual-only reservations and
> > > > > identity mappings, or "simple partial mappings" (that map a sub-region
> > > > > starting from the beginning). Identity mappings could still be
> > > > > simplified by just omitting the "reg" property. For more complicated
> > > > > mappings, such as the ones on M1, the "ranges" property could be used.
> > > > > 
> > > > > Note how this looks a bit boilerplate-y, but it's actually really 
> > > > > quite
> > > > > simple to understand, even for humans, I think.
> > > > > 
> > > > > Also, the phandles in this are comparatively easy to wire up because
> > > > > they can all be generated in a hierarchical way: generate physical
> > > > > reservation and store phandle, then generate I/O virtual reservation
> > > > > to reference that phandle and store the new phandle as well. Finally,
> > > > > wire this up to the display controller (using either the IOV phandle 
> > > > > or
> > > > > both).
> > > > > 
> > > > > Granted, this requires the addition of a new top-level node, but given
> > > > > how expressive this becomes, I think it might be worth a second
> > > > > consideration.
> > > > 
> > > > I guess as a middle-ground between your suggestion and mine, we could
> > > > also move the IOV nodes back into reserved-memory. If we make sure the
> > > > names (together with unit-addresses) are unique, to support cases where
> > > > we want to identity map, or have multiple mappings at the same address.
> > > > So it'd look something like this:
> > > > 
> > > >         reserved-memory {
> > > >                 fb: fb@80000000 {
> > > >                         reg = <0x80000000 0x01000000>;
> > > >                 };
> > > > 
> > > >                 audio-firmware@ff000000 {
> > > >                         /* perhaps add "iommu-reserved" for this case */
> > > >                         compatible = "iommu-mapping";
> > > >                         /*
> > > >                          * no memory-region referencing a physical
> > > >                          * reservation, indicates that this is an
> > > >                          * IOMMU reservation, rather than a mapping
> > > >                          /
> > > >                         reg = <0xff000000 0x01000000>;
> > > >                 };
> > > > 
> > > >                 fb0: fb-mapping@80000000 {
> > > >                         compatible = "iommu-mapping";
> > > >                         /* identity mapping, "reg" optional? */
> > > >                         reg = <0x80000000 0x01000000>;
> > > >                         memory-region = <&fb>;
> > > >                 };
> > > > 
> > > >                 fb1: fb-mapping@90000000 {
> > > >                         compatible = "iommu-mapping";
> > > >                         /* but doesn't have to be */
> > > >                         reg = <0x90000000 0x01000000>;
> > > >                         memory-region = <&fb>;
> > > >                 };
> > > > 
> > > >                 fb2: fb-mapping@a0000000 {
> > > >                         compatible = "iommu-mapping";
> > > >                         /* can be partial, too */
> > > >                         ranges = <0xa0000000 0x00800000 0x80000000>;
> > > >                         memory-region = <&fb>;
> > > >                 };
> > > >         }
> > > > 
> > > >         dc0: dc@40000000 {
> > > >                 memory-region = <&fb0>;
> > > >                 iommus = <&dc0_iommu>;
> > > >         };
> > > > 
> > > > What do you think?
> > > 
> > > I converted the Apple M1 display controller driver to using reserved 
> > > regions using these bindings. It is sufficient for the needs of the M1 
> > > display controller which is so far the only device requiring this.
> > 
> > Thanks for trying this out. I've been meaning to resume this discussion
> > to finally get closure because we really want to enable this for various
> > Tegra SoCs.
> > 
> > > I encountered two problems with this bindings proposal:
> > > 
> > > 1) It is impossible to express which iommu needs to be used if a device 
> > > has multiple "iommus" specified. This is on the M1 only a theoretical 
> > > problem as the display co-processor devices use a single iommu.
> > 
> > From what I recall this is something that we don't fully support either
> > way. If you've got a struct device and you want to allocate DMA'able
> > memory, you can only pass that struct device to the DMA API upon
> > allocation but you have no way of specifying separate instances
> > depending on use-case.
> 
> Ok, let's us ignore then my complicated proposal. It is not a problem we 
> need to solve for the M1.
> 
> > > 2) The reserved regions can not easily looked up at iommu probe 
> > > time.  The Apple M1 iommu driver resets the iommu at probe. This 
> > > breaks the framebuffer. The display controller appears to crash then 
> > > an active scan-out framebuffer is unmapped. Resetting the iommu 
> > > looks like a sensible approach though.
> > > 
> > > To work around this I added custom property to the affected iommu node 
> > > to avoid the reset. This doesn't feel correct since the reason to avoid 
> > > the reset is that we have to maintain the reserved regions mapping until 
> > > the display controller driver takes over.
> > > As far as I can see the only method to retrieve devices with reserved 
> > > memory from the iommu is to iterate over all devices. This looks 
> > > impractical. The M1 has over 20 distinct iommus.
> > 
> > Do I understand correctly that on the M1, the firmware sets up a mapping
> > in the IOMMU already and then you want to recreate that mapping after
> > the IOMMU driver has reset the IOMMU?
> 
> The mappings are already set up by firmware as it uses the frame buffer 
> already itself. We need to make the kernel aware of the existing mapping 
> so it can use the IOMMU. Using reserved memory regions and mappings 
> seems to be clean way to do this. We want to reset IOMMUs without 
> pre-existing mappings (the M1 has over 20 IOMMUs). We need a way to 
> identify the two IOMMUs which must not be reseted at driver probe time.  
> A simple property in the IOMMU node would be enough. It would duplicate 
> information though since the only reason why we can't reset the IOMMU is 
> the pre-existing mapping
> 
> > In that case, how do you make sure that you atomically transition from
> > the firmware mapping to the kernel mapping? As soon as you reset the
> > IOMMU, the display controller will cause IOMMU faults because its now
> > scanning out from an unmapped buffer, right?
> 
> We are replacing the entire firmware managed page table with a kernel 
> managed one with a TTBR MMIO register write. The second IOMMU with 
> pre-existing mapping has unfortunately the TTBR locked. Dealing with 
> this is more complicated but the device using this IOMMU appears to
> sleep.
> 
> > So that approach of avoiding the reset doesn't seem wrong to me.
> > Obviously that's not altogether trivial to do either. Typically the
> > IOMMU mappings would be contained in system memory, so you'd have to
> > reserve those via reserved-memory nodes as well, etc.
> 
> The system memory is currently not expressed as reserved-memory but 
> simply outside of the specified memory.
>  
> > > One way to avoid both problems would be to move the mappings to the 
> > > iommu node as sub nodes. The device would then reference those.  
> > > This way the mapping is readily available at iommu probe time and 
> > > adding iommu type specific parameters to map the region correctly is 
> > > possible.
> > > 
> > > The sample above would transfor to:
> > > 
> > >   reserved-memory {
> > >           fb: fb@80000000 {
> > >                   reg = <0x80000000 0x01000000>;
> > >           };
> > >   };
> > > 
> > >   dc0_iommu: iommu@20000000 {
> > >           #iommu-cells = <1>;
> > > 
> > >           fb0: fb-mapping@80000000 {
> > >                   compatible = "iommu-mapping";
> > >                   /* identity mapping, "reg" optional? */
> > >                   reg = <0x80000000 0x01000000>;
> > >                   memory-region = <&fb>;
> > >                   device-id = <0>; /* for #iommu-cells*/
> > >           };
> > > 
> > >           fb1: fb-mapping@90000000 {
> > >                   compatible = "iommu-mapping";
> > >                   /* but doesn't have to be */
> > >                   reg = <0x90000000 0x01000000>;
> > >                   memory-region = <&fb>;
> > >                   device-id = <1>; /* for #iommu-cells*/
> > >           };
> > >   };
> > > 
> > >   dc0: dc@40000000 {
> > >           iommu-region = <&fb0>;
> > >           iommus = <&dc0_iommu 0>;
> > >   };
> > > 
> > > Does anyone see problems with this approach or can think of something 
> > > better?
> > 
> > The device tree description of this looks a bit weird because it
> > sprinkles things all around. For instance now we've got the "stream ID"
> > (i.e. what you seem to be referring to as "device-id") in two places,
> > once in the iommus property of the DC node and once in the mapping.
> 
> Yes, stream_id would be the device-id. It is the term used in the 
> apple-dart IOMMU driver. It is duplicated to deal with the multiple 
> IOMMU problem. Let's ignore that and scrape my proposal.
>  
> > Would it work if you added back-references to the devices that are
> > active on boot to the IOMMU node? Something along these lines:
> > 
> >     reserved-memory {
> >             fb: fb@80000000 {
> >                     reg = <0x80000000 0x01000000>;
> >             };
> >     };
> > 
> >     dc0_iommu: iommu@20000000 {
> >             #iommu-cells = <1>;
> > 
> >             mapped-devices = <&dc0>;
> >     };
> > 
> >     dc0: dc@40000000 {
> >             memory-region = <&fb0>;
> >             iommus = <&dc0_iommu 0>;
> >     };
> > 
> > Depending on how you look at it that's a circular dependency, but it
> > won't be in practice. It makes things a bit more compact and puts the
> > data where it belongs.
> 
> Yes, this works for the Apple M1 display co-processor. I've changed the 
> dts and my apple-dart private parsing code to use "mapped-devices" 
> back-references and it works as before. We probably need an automated 
> check to ensure the references between device and IOMMU remains 
> consistent.

Circling back to this... again. I've been thinking about this some more
and have come up with a mix between what Rob, Janne and I had proposed.
This is how it would look (based on Tegra210):

        reserved-memory {
                fb: framebuffer@80000000 {
                        /*
                         * Physical memory region that is reserved. If
                         * this property is omitted, this region should
                         * be treated as an IOVA reservation.
                         */
                        reg = <0x80000000 0x01000000>;

                        /*
                         * Create 1:1 mapping for display controller.
                         *
                         * Note how instead of the IOMMU reference we
                         * actually pass the device reference here. This
                         * combines the "mapped-devices" property that
                         * was proposed earlier and makes it easier to
                         * find the device that needs this mapping. The
                         * IOMMU phandle and specifier can be obtained
                         * via this backlink to the consumer device.
                         *
                         * More than one entry could be specified here
                         * to allow mappings for multiple devices. This
                         * avoids the problem of having multiple nodes
                         * with the same name.
                         *
                         * Could also be "iommu-addresses" as Rob had
                         * suggested earlier, but "iommu-mapping" seems
                         * a bit more appropriate given that there's
                         * also the phandle now.
                         */
                        iommu-mapping = <&dc 0x80000000 0x01000000>;
                };
        };

        mc: memory-controller@70019000 {
                ...
                #iommu-cells = <1>;
                ...
        };

        dc: dc@54200000 {
                ...
                iommus = <&mc TEGRA_SWGROUP_DC>;

                /*
                 * As in earlier proposals, this could be optional if
                 * all we need is the IOMMU mapping. It can be specified
                 * if there's a need for the driver to use the physical
                 * memory region (i.e. to copy out existing framebuffer
                 * content and recycle memory).
                 */
                memory-region = <&fb>;
                ...
        };

One last remaining question that I have for this is whether we also need
some sort of #address-cells and #size-cells for the IOMMU which we need
to determine how many cells the addresses in iommu-mapping need to have.
I suppose we could derive that from the dma-ranges property somehow,
since that defines the addressable region of the device that needs the
mapping.

Thierry

Attachment: signature.asc
Description: PGP signature

Reply via email to