On 29.10.20 09:39, Andrea Bastoni wrote: > On 29/10/2020 07:36, Jan Kiszka wrote: >> On 28.10.20 22:29, Andrea Bastoni wrote: >>> Hi, >>> >>> On 28/10/2020 21:14, Jan Kiszka wrote: >>>> On 27.10.20 10:22, Jan Kiszka wrote: >>>>> On 27.10.20 02:25, Peng Fan wrote: >>>>>> Jan, >>>>>> >>>>>>> Subject: Re: [PATCH v2 00/46] arm64: Rework SMMUv2 support >>>>>>> >>>>>>> On 14.10.20 10:28, Jan Kiszka wrote: >>>>>>>> Changes in v2: >>>>>>>> - map 52-bit parange to 48 >>>>>>>> >>>>>>>> That wasn't the plan when I started, but the more I dug into the >>>>>>>> details and started to understand the hardware, the more issues I >>>>>>>> found and the more dead code fragments from the Linux usage became >>>>>>> visible. >>>>>>>> >>>>>>>> Highlights of the outcome: >>>>>>>> - Fix stall of SMMU due to unhandled stalled contexts (took me a while >>>>>>>> to understand that...) >>>>>>>> - Fix programming of CBn_TCR and TTBR >>>>>>>> - Fix TLB flush on cell exit >>>>>>>> - Fix bogus handling of Extended StreamID support >>>>>>>> - Do not pass-through unknown streams >>>>>>>> - Disable SMMU on shutdown >>>>>>>> - Reassign StreamIDs to the root cell >>>>>>>> - 225 insertions(+), 666 deletions(-) >>>>>>>> >>>>>>>> The code works as expected on the Ultra96-v2 here, but due to all the >>>>>>>> time that went into the rework, I had no chance to bring up my MX8QM >>>>>>>> so far. I'm fairly optimistic that things are not broken there as >>>>>>>> well, but if they are, bisecting should be rather simple with this >>>>>>>> series. So please test and review. >>>>>>>> >>>>>>> >>>>>>> Alice, Peng, already had a chance to review or test (ie. next)? >>>>>> >>>>>> I gave a test, sometimes I met SDHC ADMA error when >>>>>> `jailhouse enable imx8qm.cell`, sometimes it work well. >>>>>> >>>>>> I suspect when during jailhouse enable phase, there might be >>>>>> ongoing sdhc transactions not finished, not sure. >>>>>> >>>>>> I have not find time to look into details. >>>>>> >>>>>> Anyway, you could check in to master I think, we could address >>>>>> the issue later when I have time. >>>>>> >>>>> >>>>> Hmm, I would still like to understand this first... Do you have the >>>>> chance to bisect this effect to a commit? Otherwise, I guess I finally >>>>> need to get my board running. >>>>> >>>> >>>> It's running now (quite some effort due to the incomplete upstream state >>>> - e.g. upstream u-boot runs but cannot boot all downstream kernels...), >>>> but I wasn't able to reproduce startup issues. Shutting down Jailhouse >>>> often hangs, though, at least restarting does all the time. And that >>>> even with next. Seems we still do not properly turn off/on something here. >>>> >>>> Interestingly, this issue was not present on the zynqmp. >>> >>> On a different version of the SMMUv2 developed @ Boston University (Renato >>> in >>> CC), re-using the same root page table as the cell created problems due to >>> different attributes (uncached) needed by some devices. >> >> Why are so many folks working downstream on such essential things? Not >> helpful, for everyone, even if the goal should be "only" experimental >> results. >> >>> >>>> diff --git a/hypervisor/arch/arm64/smmu.c b/hypervisor/arch/arm64/smmu.c >>>> index 41c0ffb4..60743bc0 100644 >>>> --- a/hypervisor/arch/arm64/smmu.c >>>> +++ b/hypervisor/arch/arm64/smmu.c >>>> @@ -220,6 +220,7 @@ static void arm_smmu_setup_context_bank(struct >>>> arm_smmu_device *smmu, >>>> mmio_write32(cb_base + ARM_SMMU_CB_TCR, VTCR_CELL & ~TCR_RES0); >>>> >>>> /* TTBR0 */ >>>> + /* Here */ >>>> mmio_write64(cb_base + ARM_SMMU_CB_TTBR0, >>>> paging_hvirt2phys(cell->arch.mm.root_table) & >>>> TTBR_MASK); >>> >>> The issue in the BU version was solved by allocating a new page for this. >>> >> >> Only the root level? How were those entries different? > > Only the root level. IIRC, NC by default, instead of Normal. > >>> I wanted to check this effect for the version on next, but didn't find the >>> time >>> to do it so far :/ >>> >> >> How was the issue triggered? > > From the discussions I had, on the ZCU102, devices were randomly triggering > erros/ stopped working. >
I just ran a enable/disable loop aside flood-ping + dd on the Ultra96-v2 (I would expect it to be identical to the ZCU102 in this regard), and that did not trigger any (visible) issues yet. I'll retry with lowering the enable frequency. Jan > >> >> >> I made some progress meanwhile: Linux was also using the SMMU. I'll send >> a patch shortly that detects that, like we already in VT-d at least. >> Interestingly, this should have been broken on the Ultra96 as well, just >> didn't notice. >> >> With that, I'm running enable/disable loops now, but I lose my Ethernet >> link after a while. Returns after ifdown/up, and the system looks fine >> otherwise. Seems as if we drop transactions in the transition phase. >> However, a dd running in parallel was not triggering any issues. >> >> Jan >> > -- Siemens AG, T RDA IOT Corporate Competence Center Embedded Linux -- You received this message because you are subscribed to the Google Groups "Jailhouse" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jailhouse-dev/eaa35809-1823-9bac-a971-12b9e4a2ec54%40siemens.com.
