Hi Jan, Andrea, Thanks for looping me in on this topic!
Just a not that I in my version I don't just recreate the root page, but the entire page table tree to be NC for the SMMU. That immediately gave me a stable system on the ZCU102. I have tested that even with a dynamically recolored root cell. I haven't tested with colored DMA-capable inmates, but I'll do that soon as I have just added support to boot colored Linux inmates. I keep SMMU and CPU tables in sync by duplicating any change in map/unmap operations at cell creation time. @Jan, when I started working on my current repo, I did it to quickly port JH to a dev board I quickly needed to work on (the NXP S32V234). It just so happened that I branched out quite a bit. I am quite happy if some of my effort helps with upstream dev. I just don't have too many cycles at the moment to propose well thought out patches. I am actually looking forward to having this conversation in a few weeks. Cheers, Renato On Thu, Oct 29, 2020, 4:53 AM Jan Kiszka <[email protected]> wrote: > On 29.10.20 09:39, Andrea Bastoni wrote: > > On 29/10/2020 07:36, Jan Kiszka wrote: > >> On 28.10.20 22:29, Andrea Bastoni wrote: > >>> Hi, > >>> > >>> On 28/10/2020 21:14, Jan Kiszka wrote: > >>>> On 27.10.20 10:22, Jan Kiszka wrote: > >>>>> On 27.10.20 02:25, Peng Fan wrote: > >>>>>> Jan, > >>>>>> > >>>>>>> Subject: Re: [PATCH v2 00/46] arm64: Rework SMMUv2 support > >>>>>>> > >>>>>>> On 14.10.20 10:28, Jan Kiszka wrote: > >>>>>>>> Changes in v2: > >>>>>>>> - map 52-bit parange to 48 > >>>>>>>> > >>>>>>>> That wasn't the plan when I started, but the more I dug into the > >>>>>>>> details and started to understand the hardware, the more issues I > >>>>>>>> found and the more dead code fragments from the Linux usage became > >>>>>>> visible. > >>>>>>>> > >>>>>>>> Highlights of the outcome: > >>>>>>>> - Fix stall of SMMU due to unhandled stalled contexts (took me a > while > >>>>>>>> to understand that...) > >>>>>>>> - Fix programming of CBn_TCR and TTBR > >>>>>>>> - Fix TLB flush on cell exit > >>>>>>>> - Fix bogus handling of Extended StreamID support > >>>>>>>> - Do not pass-through unknown streams > >>>>>>>> - Disable SMMU on shutdown > >>>>>>>> - Reassign StreamIDs to the root cell > >>>>>>>> - 225 insertions(+), 666 deletions(-) > >>>>>>>> > >>>>>>>> The code works as expected on the Ultra96-v2 here, but due to all > the > >>>>>>>> time that went into the rework, I had no chance to bring up my > MX8QM > >>>>>>>> so far. I'm fairly optimistic that things are not broken there as > >>>>>>>> well, but if they are, bisecting should be rather simple with this > >>>>>>>> series. So please test and review. > >>>>>>>> > >>>>>>> > >>>>>>> Alice, Peng, already had a chance to review or test (ie. next)? > >>>>>> > >>>>>> I gave a test, sometimes I met SDHC ADMA error when > >>>>>> `jailhouse enable imx8qm.cell`, sometimes it work well. > >>>>>> > >>>>>> I suspect when during jailhouse enable phase, there might be > >>>>>> ongoing sdhc transactions not finished, not sure. > >>>>>> > >>>>>> I have not find time to look into details. > >>>>>> > >>>>>> Anyway, you could check in to master I think, we could address > >>>>>> the issue later when I have time. > >>>>>> > >>>>> > >>>>> Hmm, I would still like to understand this first... Do you have the > >>>>> chance to bisect this effect to a commit? Otherwise, I guess I > finally > >>>>> need to get my board running. > >>>>> > >>>> > >>>> It's running now (quite some effort due to the incomplete upstream > state > >>>> - e.g. upstream u-boot runs but cannot boot all downstream > kernels...), > >>>> but I wasn't able to reproduce startup issues. Shutting down Jailhouse > >>>> often hangs, though, at least restarting does all the time. And that > >>>> even with next. Seems we still do not properly turn off/on something > here. > >>>> > >>>> Interestingly, this issue was not present on the zynqmp. > >>> > >>> On a different version of the SMMUv2 developed @ Boston University > (Renato in > >>> CC), re-using the same root page table as the cell created problems > due to > >>> different attributes (uncached) needed by some devices. > >> > >> Why are so many folks working downstream on such essential things? Not > >> helpful, for everyone, even if the goal should be "only" experimental > >> results. > >> > >>> > >>>> diff --git a/hypervisor/arch/arm64/smmu.c > b/hypervisor/arch/arm64/smmu.c > >>>> index 41c0ffb4..60743bc0 100644 > >>>> --- a/hypervisor/arch/arm64/smmu.c > >>>> +++ b/hypervisor/arch/arm64/smmu.c > >>>> @@ -220,6 +220,7 @@ static void arm_smmu_setup_context_bank(struct > arm_smmu_device *smmu, > >>>> mmio_write32(cb_base + ARM_SMMU_CB_TCR, VTCR_CELL & > ~TCR_RES0); > >>>> > >>>> /* TTBR0 */ > >>>> + /* Here */ > >>>> mmio_write64(cb_base + ARM_SMMU_CB_TTBR0, > >>>> paging_hvirt2phys(cell->arch.mm.root_table) & > TTBR_MASK); > >>> > >>> The issue in the BU version was solved by allocating a new page for > this. > >>> > >> > >> Only the root level? How were those entries different? > > > > Only the root level. IIRC, NC by default, instead of Normal. > > > >>> I wanted to check this effect for the version on next, but didn't find > the time > >>> to do it so far :/ > >>> > >> > >> How was the issue triggered? > > > > From the discussions I had, on the ZCU102, devices were randomly > triggering > > erros/ stopped working. > > > > I just ran a enable/disable loop aside flood-ping + dd on the Ultra96-v2 > (I would expect it to be identical to the ZCU102 in this regard), and > that did not trigger any (visible) issues yet. I'll retry with lowering > the enable frequency. > > Jan > > > > >> > >> > >> I made some progress meanwhile: Linux was also using the SMMU. I'll send > >> a patch shortly that detects that, like we already in VT-d at least. > >> Interestingly, this should have been broken on the Ultra96 as well, just > >> didn't notice. > >> > >> With that, I'm running enable/disable loops now, but I lose my Ethernet > >> link after a while. Returns after ifdown/up, and the system looks fine > >> otherwise. Seems as if we drop transactions in the transition phase. > >> However, a dd running in parallel was not triggering any issues. > >> > >> Jan > >> > > > > -- > Siemens AG, T RDA IOT > Corporate Competence Center Embedded Linux > -- You received this message because you are subscribed to the Google Groups "Jailhouse" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jailhouse-dev/CAL30Xq9rJCRQLTqdhuMdLz%2BRXEoJDqv%2Br4jZVbmmX%2BFnqvvBCw%40mail.gmail.com.
