On 30.10.20 09:35, Jan Kiszka wrote: > On 29.10.20 16:26, Andrea Bastoni wrote: >> On 29/10/2020 09:53, Jan Kiszka wrote: >>> On 29.10.20 09:39, Andrea Bastoni wrote: >>>> On 29/10/2020 07:36, Jan Kiszka wrote: >>>>> On 28.10.20 22:29, Andrea Bastoni wrote: >>>>>> Hi, >>>>>> >>>>>> On 28/10/2020 21:14, Jan Kiszka wrote: >>>>>>> On 27.10.20 10:22, Jan Kiszka wrote: >>>>>>>> On 27.10.20 02:25, Peng Fan wrote: >>>>>>>>> Jan, >>>>>>>>> >>>>>>>>>> Subject: Re: [PATCH v2 00/46] arm64: Rework SMMUv2 support >>>>>>>>>> >>>>>>>>>> On 14.10.20 10:28, Jan Kiszka wrote: >>>>>>>>>>> Changes in v2: >>>>>>>>>>> - map 52-bit parange to 48 >>>>>>>>>>> >>>>>>>>>>> That wasn't the plan when I started, but the more I dug into the >>>>>>>>>>> details and started to understand the hardware, the more issues I >>>>>>>>>>> found and the more dead code fragments from the Linux usage became >>>>>>>>>> visible. >>>>>>>>>>> >>>>>>>>>>> Highlights of the outcome: >>>>>>>>>>> - Fix stall of SMMU due to unhandled stalled contexts (took me a >>>>>>>>>>> while >>>>>>>>>>> to understand that...) >>>>>>>>>>> - Fix programming of CBn_TCR and TTBR >>>>>>>>>>> - Fix TLB flush on cell exit >>>>>>>>>>> - Fix bogus handling of Extended StreamID support >>>>>>>>>>> - Do not pass-through unknown streams >>>>>>>>>>> - Disable SMMU on shutdown >>>>>>>>>>> - Reassign StreamIDs to the root cell >>>>>>>>>>> - 225 insertions(+), 666 deletions(-) >>>>>>>>>>> >>>>>>>>>>> The code works as expected on the Ultra96-v2 here, but due to all >>>>>>>>>>> the >>>>>>>>>>> time that went into the rework, I had no chance to bring up my MX8QM >>>>>>>>>>> so far. I'm fairly optimistic that things are not broken there as >>>>>>>>>>> well, but if they are, bisecting should be rather simple with this >>>>>>>>>>> series. So please test and review. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Alice, Peng, already had a chance to review or test (ie. next)? >>>>>>>>> >>>>>>>>> I gave a test, sometimes I met SDHC ADMA error when >>>>>>>>> `jailhouse enable imx8qm.cell`, sometimes it work well. >>>>>>>>> >>>>>>>>> I suspect when during jailhouse enable phase, there might be >>>>>>>>> ongoing sdhc transactions not finished, not sure. >>>>>>>>> >>>>>>>>> I have not find time to look into details. >>>>>>>>> >>>>>>>>> Anyway, you could check in to master I think, we could address >>>>>>>>> the issue later when I have time. >>>>>>>>> >>>>>>>> >>>>>>>> Hmm, I would still like to understand this first... Do you have the >>>>>>>> chance to bisect this effect to a commit? Otherwise, I guess I finally >>>>>>>> need to get my board running. >>>>>>>> >>>>>>> >>>>>>> It's running now (quite some effort due to the incomplete upstream state >>>>>>> - e.g. upstream u-boot runs but cannot boot all downstream kernels...), >>>>>>> but I wasn't able to reproduce startup issues. Shutting down Jailhouse >>>>>>> often hangs, though, at least restarting does all the time. And that >>>>>>> even with next. Seems we still do not properly turn off/on something >>>>>>> here. >>>>>>> >>>>>>> Interestingly, this issue was not present on the zynqmp. >>>>>> >>>>>> On a different version of the SMMUv2 developed @ Boston University >>>>>> (Renato in >>>>>> CC), re-using the same root page table as the cell created problems due >>>>>> to >>>>>> different attributes (uncached) needed by some devices. >>>>> >>>>> Why are so many folks working downstream on such essential things? Not >>>>> helpful, for everyone, even if the goal should be "only" experimental >>>>> results. >>>>> >>>>>> >>>>>>> diff --git a/hypervisor/arch/arm64/smmu.c b/hypervisor/arch/arm64/smmu.c >>>>>>> index 41c0ffb4..60743bc0 100644 >>>>>>> --- a/hypervisor/arch/arm64/smmu.c >>>>>>> +++ b/hypervisor/arch/arm64/smmu.c >>>>>>> @@ -220,6 +220,7 @@ static void arm_smmu_setup_context_bank(struct >>>>>>> arm_smmu_device *smmu, >>>>>>> mmio_write32(cb_base + ARM_SMMU_CB_TCR, VTCR_CELL & ~TCR_RES0); >>>>>>> >>>>>>> /* TTBR0 */ >>>>>>> + /* Here */ >>>>>>> mmio_write64(cb_base + ARM_SMMU_CB_TTBR0, >>>>>>> paging_hvirt2phys(cell->arch.mm.root_table) & >>>>>>> TTBR_MASK); >>>>>> >>>>>> The issue in the BU version was solved by allocating a new page for this. >>>>>> >>>>> >>>>> Only the root level? How were those entries different? >>>> >>>> Only the root level. IIRC, NC by default, instead of Normal. >>>> >>>>>> I wanted to check this effect for the version on next, but didn't find >>>>>> the time >>>>>> to do it so far :/ >>>>>> >>>>> >>>>> How was the issue triggered? >>>> >>>> From the discussions I had, on the ZCU102, devices were randomly triggering >>>> erros/ stopped working. >>>> >>> >>> I just ran a enable/disable loop aside flood-ping + dd on the Ultra96-v2 >>> (I would expect it to be identical to the ZCU102 in this regard), and >>> that did not trigger any (visible) issues yet. I'll retry with lowering >>> the enable frequency. >> >> I extended the configuration of the zynqmp-zcu102 to use the SMMU and I've >> started similar tests (enable/disable + flood ping + find /). >> >> With the flooding ping I can regularly trigger ethernet errors in the diable >> -> >> enable interval e.g.,: >> >> [ 373.470078] The Jailhouse was closed. >> [ 374.957052] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK >> [ 374.966376] The Jailhouse is opening. >> >> Maybe just outstanding transactions. >> >> I got once an extended error that included the SD card >> >> [ 112.215426] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK >> [ 112.223243] mmc0: ADMA error: 0x02000000 >> ... full dump ... >> >> But I cannot detect from the log if it was after the disable or while >> jailhouse >> was enabled. >> >> I didn't have time to debug much further. I want to double check (also with a >> colleague) my current stream_id configuration because I only covered the LPD >> masters and I want to check the other TBUs. (I'll post the configuration once >> I've checked it.) >> > > I think I found another issue, may "The Issue" (tm): We enable the SMMU > prior to having the guest page table created. That is only the case on > config-commit. I'll try to move the device-reset function there and give > that a shot soon. >
Yep, I have a stable enable/disable loop on my imx8qm now. Requires some smaller refactorings. Patches will follow. Jan -- Siemens AG, T RDA IOT Corporate Competence Center Embedded Linux -- You received this message because you are subscribed to the Google Groups "Jailhouse" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jailhouse-dev/03da8732-0592-cdc5-c4d2-b5edcb3cab5d%40siemens.com.
