Hi Jan, Andrea,

Thanks for looping me in on this topic!

Just a not that I in my version I don't just recreate the root page, but
the entire page table tree to be NC for the SMMU. That immediately gave me
a stable system on the ZCU102. I have tested that even with a dynamically
recolored root cell. I haven't tested with colored DMA-capable inmates, but
I'll do that soon as I have just added support to boot colored Linux
inmates. I keep SMMU and CPU tables in sync by duplicating any change in
map/unmap operations at cell creation time.

@Jan, when I started working on my current repo, I did it to quickly port
JH to a dev board I quickly needed to work on (the NXP S32V234). It just so
happened that I branched out quite a bit. I am quite happy if some of my
effort helps with upstream dev. I just don't have too many cycles at the
moment to propose well thought out patches. I am actually looking forward
to having this conversation in a few weeks.

Cheers,
Renato




On Thu, Oct 29, 2020, 4:53 AM Jan Kiszka <[email protected]> wrote:

> On 29.10.20 09:39, Andrea Bastoni wrote:
> > On 29/10/2020 07:36, Jan Kiszka wrote:
> >> On 28.10.20 22:29, Andrea Bastoni wrote:
> >>> Hi,
> >>>
> >>> On 28/10/2020 21:14, Jan Kiszka wrote:
> >>>> On 27.10.20 10:22, Jan Kiszka wrote:
> >>>>> On 27.10.20 02:25, Peng Fan wrote:
> >>>>>> Jan,
> >>>>>>
> >>>>>>> Subject: Re: [PATCH v2 00/46] arm64: Rework SMMUv2 support
> >>>>>>>
> >>>>>>> On 14.10.20 10:28, Jan Kiszka wrote:
> >>>>>>>> Changes in v2:
> >>>>>>>>  - map 52-bit parange to 48
> >>>>>>>>
> >>>>>>>> That wasn't the plan when I started, but the more I dug into the
> >>>>>>>> details and started to understand the hardware, the more issues I
> >>>>>>>> found and the more dead code fragments from the Linux usage became
> >>>>>>> visible.
> >>>>>>>>
> >>>>>>>> Highlights of the outcome:
> >>>>>>>>  - Fix stall of SMMU due to unhandled stalled contexts (took me a
> while
> >>>>>>>>    to understand that...)
> >>>>>>>>  - Fix programming of CBn_TCR and TTBR
> >>>>>>>>  - Fix TLB flush on cell exit
> >>>>>>>>  - Fix bogus handling of Extended StreamID support
> >>>>>>>>  - Do not pass-through unknown streams
> >>>>>>>>  - Disable SMMU on shutdown
> >>>>>>>>  - Reassign StreamIDs to the root cell
> >>>>>>>>  - 225 insertions(+), 666 deletions(-)
> >>>>>>>>
> >>>>>>>> The code works as expected on the Ultra96-v2 here, but due to all
> the
> >>>>>>>> time that went into the rework, I had no chance to bring up my
> MX8QM
> >>>>>>>> so far. I'm fairly optimistic that things are not broken there as
> >>>>>>>> well, but if they are, bisecting should be rather simple with this
> >>>>>>>> series. So please test and review.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Alice, Peng, already had a chance to review or test (ie. next)?
> >>>>>>
> >>>>>> I gave a test, sometimes I met SDHC ADMA error when
> >>>>>> `jailhouse enable imx8qm.cell`, sometimes it work well.
> >>>>>>
> >>>>>> I suspect when during jailhouse enable phase, there might be
> >>>>>> ongoing sdhc transactions not finished, not sure.
> >>>>>>
> >>>>>> I have not find time to look into details.
> >>>>>>
> >>>>>> Anyway, you could check in to master I think, we could address
> >>>>>> the issue later when I have time.
> >>>>>>
> >>>>>
> >>>>> Hmm, I would still like to understand this first... Do you have the
> >>>>> chance to bisect this effect to a commit? Otherwise, I guess I
> finally
> >>>>> need to get my board running.
> >>>>>
> >>>>
> >>>> It's running now (quite some effort due to the incomplete upstream
> state
> >>>> - e.g. upstream u-boot runs but cannot boot all downstream
> kernels...),
> >>>> but I wasn't able to reproduce startup issues. Shutting down Jailhouse
> >>>> often hangs, though, at least restarting does all the time. And that
> >>>> even with next. Seems we still do not properly turn off/on something
> here.
> >>>>
> >>>> Interestingly, this issue was not present on the zynqmp.
> >>>
> >>> On a different version of the SMMUv2 developed @ Boston University
> (Renato in
> >>> CC), re-using the same root page table as the cell created problems
> due to
> >>> different attributes (uncached) needed by some devices.
> >>
> >> Why are so many folks working downstream on such essential things? Not
> >> helpful, for everyone, even if the goal should be "only" experimental
> >> results.
> >>
> >>>
> >>>> diff --git a/hypervisor/arch/arm64/smmu.c
> b/hypervisor/arch/arm64/smmu.c
> >>>> index 41c0ffb4..60743bc0 100644
> >>>> --- a/hypervisor/arch/arm64/smmu.c
> >>>> +++ b/hypervisor/arch/arm64/smmu.c
> >>>> @@ -220,6 +220,7 @@ static void arm_smmu_setup_context_bank(struct
> arm_smmu_device *smmu,
> >>>>         mmio_write32(cb_base + ARM_SMMU_CB_TCR, VTCR_CELL &
> ~TCR_RES0);
> >>>>
> >>>>         /* TTBR0 */
> >>>> +       /* Here */
> >>>>         mmio_write64(cb_base + ARM_SMMU_CB_TTBR0,
> >>>>                      paging_hvirt2phys(cell->arch.mm.root_table) &
> TTBR_MASK);
> >>>
> >>> The issue in the BU version was solved by allocating a new page for
> this.
> >>>
> >>
> >> Only the root level? How were those entries different?
> >
> > Only the root level. IIRC, NC by default, instead of Normal.
> >
> >>> I wanted to check this effect for the version on next, but didn't find
> the time
> >>> to do it so far :/
> >>>
> >>
> >> How was the issue triggered?
> >
> > From the discussions I had, on the ZCU102, devices were randomly
> triggering
> > erros/ stopped working.
> >
>
> I just ran a enable/disable loop aside flood-ping + dd on the Ultra96-v2
> (I would expect it to be identical to the ZCU102 in this regard), and
> that did not trigger any (visible) issues yet. I'll retry with lowering
> the enable frequency.
>
> Jan
>
> >
> >>
> >>
> >> I made some progress meanwhile: Linux was also using the SMMU. I'll send
> >> a patch shortly that detects that, like we already in VT-d at least.
> >> Interestingly, this should have been broken on the Ultra96 as well, just
> >> didn't notice.
> >>
> >> With that, I'm running enable/disable loops now, but I lose my Ethernet
> >> link after a while. Returns after ifdown/up, and the system looks fine
> >> otherwise. Seems as if we drop transactions in the transition phase.
> >> However, a dd running in parallel was not triggering any issues.
> >>
> >> Jan
> >>
> >
>
> --
> Siemens AG, T RDA IOT
> Corporate Competence Center Embedded Linux
>

-- 
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jailhouse-dev/CAL30Xq9rJCRQLTqdhuMdLz%2BRXEoJDqv%2Br4jZVbmmX%2BFnqvvBCw%40mail.gmail.com.

Reply via email to