On 03/07/19 4:24 PM, Jan Kiszka wrote: > On 03.07.19 12:33, Pratyush Yadav wrote: >> >> >> On 03/07/19 3:26 PM, Jan Kiszka wrote: >>> On 03.07.19 11:16, Pratyush Yadav wrote: >>>> >>>> >>>> On 02/07/19 8:58 PM, Jan Kiszka wrote: >>>>> On 02.07.19 16:57, Pratyush Yadav wrote: >>>>>> >>>>>> >>>>>> On 02/07/19 8:12 PM, Jan Kiszka wrote: >>>>>>> On 02.07.19 16:36, Pratyush Yadav wrote: >>>>>>>> A System Memory Management Unit(SMMU) performs a task analogous to a >>>>>>>> CPU's MMU, translating addresses for device requests from system I/O >>>>>>>> devices before the requests are passed into the system interconnect. >>>>>>>> >>>>>>>> Implement a driver for SMMU v3 that maps and unmaps memory for >>>>>>>> specified >>>>>>>> stream ids. >>>>>>>> >>>>>>>> An emulated SMMU is presented to inmates by trapping access to the MMIO >>>>>>>> registers to enable stage 1 translations. Accesses to the SMMU memory >>>>>>>> mapped registers are trapped and then routed to the emulated SMMU. This >>>>>>>> is not emulation in the sense that we fully emulate the device top to >>>>>>>> bottom. The emulation is used to provide an interface to the SMMU that >>>>>>>> the hypervisor can control to make sure the inmates are not doing >>>>>>>> anything they should not. The actual translations are done by hardware. >>>>>>>> >>>>>>>> Emulation is needed because both stage 1 and stage 2 parameters are >>>>>>>> configured in a single data structure, the stream table entry. For this >>>>>>>> reason, the inmates can't be allowed to directly control the stream >>>>>>>> table entries, and by extension, the stream table. >>>>>>>> >>>>>>>> The guest cells are assigned stream IDs in their configs and only those >>>>>>>> assigned stream IDs can be used by the cells. There is no checking in >>>>>>>> place to make sure two cells do not use the same stream IDs. This must >>>>>>>> be taken care of when creating the cell configs. >>>>>>>> >>>>>>>> This driver is implemented based on the following assumptions: >>>>>>>> - Running on a Little endian 64 bit core compatible with ARM v8 >>>>>>>> architecture. >>>>>>>> - SMMU supporting only AARCH64 mode. >>>>>>>> - SMMU AARCH 64 stage 2 translation configurations are compatible with >>>>>>>> ARMv8 VMSA. So re-using the translation tables of CPU for SMMU. >>>>>>>> >>>>>>>> Work left to do: >>>>>>>> - Route event notifications to the correct cell and identify which >>>>>>>> event >>>>>>>> needs to go to which cell. >>>>>>>> - Add support for IRQ and MSI routing. >>>>>>>> - Add support for PRI queues and ATS. >>>>>>>> - Identify which cell caused a command queue error and notify it. >>>>>>>> - Support sub-streams. >>>>>>>> >>>>>>>> A lot of the work left is optional features that the SMMU provides like >>>>>>>> substreams, ATS, PRI. There is little reason to add them unless there >>>>>>>> is >>>>>>>> a use case for them. >>>>>>> >>>>>>> One quick question again, I already had it for the RFC round: Would it >>>>>>> be tricky to split up this patch into single-stage only + 2-stage >>>>>>> support? That would allow me to asses the additional complexity we >>>>>>> import by adding 2-stage support. Or is 2-stage support inherently >>>>>>> coupled with the SMMU design so that such a split-up would neither make >>>>>>> sense nor buy us anything? >>>>>> >>>>>> I considered splitting, but as far as I see, stage 1 and 2 are rather >>>>>> tightly coupled. If you want, I can split it into two parts where part 1 >>>>>> is the defines, data structure declarations, and initialization. Part 2 >>>>>> will have stage 1 emulation and stage 2. >>>>> >>>>> If part 1 will get away without emulating/intercepting bits of the SMMU, >>>>> I bet it will be a win. >>>> >>>> It will. >>>> >>>>>> >>>>>> Let me know if you'd prefer that, and I'll send the series tomorrow. >>>>>> It's getting late and I'm about to leave for today. >>>>> >>>>> Sure, no hurry! >>>>> >>>>>> >>>>>>> Background is that the majority of use case I see will not need more >>>>>>> than one stage. I particular, you have no need for 2-stage support in >>>>>>> simple bare-metal or RTOS cells, leaving this only potentially relevant >>>>>>> for the root cell (or secondary Linux cells). If the feature is complex >>>>>>> and can be disabled, we could skip it, reducing the code size. >>>>>> >>>>>> Hm, as far as I understand, stage 2 should be always needed because you >>>>>> need to translate from IPA to PA every time. Stage 1 could be optional >>>>>> if the guest OS handles the scattering or gathering of the buffers. But >>>>>> don't you always need to translate from IPA to PA (even though in case >>>>>> of Jailhouse those translations are almost 1:1, and IPA == PA)? When >>>>>> would you not need stage 2? >>>>>> >>>>>> Anyway, most of the code is setting up the SMMU and the data structures, >>>>>> and stage 1 emulation. Stage 2 only part is very small. Removing it >>>>>> won't save you more than 10-15 lines. >>>>> >>>>> As noted above: If eliminating guest visibility of the SMMU helps getting >>>>> rid of related emulation and interception logic, I'm sure it will be more >>>>> than 10 lines. If we need 2 stages configuration-wise in order to use the >>>>> SMMU at all, those 2 stages need to stay, of course. >>>> >>>> Ah, I thought you meant removing stage 2 and keeping stage 1. >>>> >>>> The SMMU can work with either or both stages enabled. >>>> >>>>> But I'm not into the details yet, just deriving from the history on IOMMU >>>>> on x86 where you started with 1 stage (what we only support there so >>>>> far), and only later on a second one was added in order to allow guests >>>>> to map hardware directly while isolating it under own control (e.g. >>>>> protect against userspace-mapped devices). The latter is a bonus feature >>>>> with use cases, though not yet in Jailhouse context. >>>> >>>> Guest visibility of the SMMU is useful when you have a device not capable >>>> of scatter-gather operations (we have some on our boards, like the video >>>> decoder). They expect a contiguous buffer of memory to work. Adding stage >>>> 1 translations eliminates the need for large contiguous buffers. Another >>>> use case is when you have a multiple PCI devices under the same cell. You >>>> could easily protect memory regions of the two devices from each other >>>> with stage 1 translations. >>> >>> I get the first use case (legacy IP support - modern PCI devices no longer >>> have that issue, thus there was no need on x86 so far), but the second one >>> is exactly of that "nice-to-have" category which I consider not yet >>> relevant. >>> >>> On the downside is the fact that adding an emulation interface massively >>> widens the attack surface of the hypervisor. That is easy to get wrong, >>> just look at [1]. So I would really like to see this optional, either >>> build-time or runtime configurable, so that you can turn the interface off >>> if there is no device around anymore that needs it (I assume future IP will >>> remove such limitations more and more). >> >> A build-time configuration will need at least 3-4 sets of #ifdefs scattered >> throughout the driver, but it will reduce the binary size when the emulation >> is not used. Run time configuration will come at the cost of larger binary >> size, but only a couple of if statements need to be added. Of course, >> something in the cell config will also need to be added to tell whether to >> use both stages or not. >> >> Which would you prefer? > > For the beginning: The simplest approach for you (I would suspect #ifdef mess > will win). The key is the split-up. Then we will see clearer what needs to be > managed. Yes, #ifdef wins. I am working on the changes suggested in the first patch. If you want, I can send the series without all comments addressed, but with the SMMU driver split into two parts. You can start reviewing it while I fix the other things. Then I'll follow up with all the fixes. Let me know if that'd be better. > Jan > -- Regards, Pratyush Yadav -- You received this message because you are subscribed to the Google Groups "Jailhouse" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jailhouse-dev/2ee45bc1-ea50-db76-3329-2f5350e6793f%40ti.com. For more options, visit https://groups.google.com/d/optout.
Re: [PATCH 6/6] arm64: iommu: smmu-v3: Add support for stage 1 and 2 translations
'Pratyush Yadav' via Jailhouse Wed, 03 Jul 2019 04:06:27 -0700
- Re: [PATCH 1/6] iommu: x86: Genera... 'Pratyush Yadav' via Jailhouse
- Re: [PATCH 1/6] iommu: x86: Ge... Jan Kiszka
- [PATCH 6/6] arm64: iommu: smmu-v3: Add supp... 'Pratyush Yadav' via Jailhouse
- Re: [PATCH 6/6] arm64: iommu: smmu-v3:... Jan Kiszka
- Re: [PATCH 6/6] arm64: iommu: smmu... 'Pratyush Yadav' via Jailhouse
- Re: [PATCH 6/6] arm64: iommu: ... Jan Kiszka
- Re: [PATCH 6/6] arm64: iom... 'Pratyush Yadav' via Jailhouse
- Re: [PATCH 6/6] arm64... Jan Kiszka
- Re: [PATCH 6/6] a... 'Pratyush Yadav' via Jailhouse
- Re: [PATCH 6/6] a... Jan Kiszka
- Re: [PATCH 6/6] a... 'Pratyush Yadav' via Jailhouse
- Re: [PATCH 6/6] a... Jan Kiszka
- [PATCH 3/6] core: Add stream id list in the... 'Pratyush Yadav' via Jailhouse
- Re: [PATCH 3/6] core: Add stream id li... Jan Kiszka
- Re: [PATCH 3/6] core: Add stream i... 'Pratyush Yadav' via Jailhouse
