On 30.10.20 09:35, Jan Kiszka wrote:
> On 29.10.20 16:26, Andrea Bastoni wrote:
>> On 29/10/2020 09:53, Jan Kiszka wrote:
>>> On 29.10.20 09:39, Andrea Bastoni wrote:
>>>> On 29/10/2020 07:36, Jan Kiszka wrote:
>>>>> On 28.10.20 22:29, Andrea Bastoni wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 28/10/2020 21:14, Jan Kiszka wrote:
>>>>>>> On 27.10.20 10:22, Jan Kiszka wrote:
>>>>>>>> On 27.10.20 02:25, Peng Fan wrote:
>>>>>>>>> Jan,
>>>>>>>>>
>>>>>>>>>> Subject: Re: [PATCH v2 00/46] arm64: Rework SMMUv2 support
>>>>>>>>>>
>>>>>>>>>> On 14.10.20 10:28, Jan Kiszka wrote:
>>>>>>>>>>> Changes in v2:
>>>>>>>>>>>  - map 52-bit parange to 48
>>>>>>>>>>>
>>>>>>>>>>> That wasn't the plan when I started, but the more I dug into the
>>>>>>>>>>> details and started to understand the hardware, the more issues I
>>>>>>>>>>> found and the more dead code fragments from the Linux usage became
>>>>>>>>>> visible.
>>>>>>>>>>>
>>>>>>>>>>> Highlights of the outcome:
>>>>>>>>>>>  - Fix stall of SMMU due to unhandled stalled contexts (took me a 
>>>>>>>>>>> while
>>>>>>>>>>>    to understand that...)
>>>>>>>>>>>  - Fix programming of CBn_TCR and TTBR
>>>>>>>>>>>  - Fix TLB flush on cell exit
>>>>>>>>>>>  - Fix bogus handling of Extended StreamID support
>>>>>>>>>>>  - Do not pass-through unknown streams
>>>>>>>>>>>  - Disable SMMU on shutdown
>>>>>>>>>>>  - Reassign StreamIDs to the root cell
>>>>>>>>>>>  - 225 insertions(+), 666 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> The code works as expected on the Ultra96-v2 here, but due to all 
>>>>>>>>>>> the
>>>>>>>>>>> time that went into the rework, I had no chance to bring up my MX8QM
>>>>>>>>>>> so far. I'm fairly optimistic that things are not broken there as
>>>>>>>>>>> well, but if they are, bisecting should be rather simple with this
>>>>>>>>>>> series. So please test and review.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Alice, Peng, already had a chance to review or test (ie. next)?
>>>>>>>>>
>>>>>>>>> I gave a test, sometimes I met SDHC ADMA error when
>>>>>>>>> `jailhouse enable imx8qm.cell`, sometimes it work well.
>>>>>>>>>
>>>>>>>>> I suspect when during jailhouse enable phase, there might be
>>>>>>>>> ongoing sdhc transactions not finished, not sure.
>>>>>>>>>
>>>>>>>>> I have not find time to look into details.
>>>>>>>>>
>>>>>>>>> Anyway, you could check in to master I think, we could address
>>>>>>>>> the issue later when I have time.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hmm, I would still like to understand this first... Do you have the
>>>>>>>> chance to bisect this effect to a commit? Otherwise, I guess I finally
>>>>>>>> need to get my board running.
>>>>>>>>
>>>>>>>
>>>>>>> It's running now (quite some effort due to the incomplete upstream state
>>>>>>> - e.g. upstream u-boot runs but cannot boot all downstream kernels...),
>>>>>>> but I wasn't able to reproduce startup issues. Shutting down Jailhouse
>>>>>>> often hangs, though, at least restarting does all the time. And that
>>>>>>> even with next. Seems we still do not properly turn off/on something 
>>>>>>> here.
>>>>>>>
>>>>>>> Interestingly, this issue was not present on the zynqmp.
>>>>>>
>>>>>> On a different version of the SMMUv2 developed @ Boston University 
>>>>>> (Renato in
>>>>>> CC), re-using the same root page table as the cell created problems due 
>>>>>> to
>>>>>> different attributes (uncached) needed by some devices.
>>>>>
>>>>> Why are so many folks working downstream on such essential things? Not
>>>>> helpful, for everyone, even if the goal should be "only" experimental
>>>>> results.
>>>>>
>>>>>>
>>>>>>> diff --git a/hypervisor/arch/arm64/smmu.c b/hypervisor/arch/arm64/smmu.c
>>>>>>> index 41c0ffb4..60743bc0 100644
>>>>>>> --- a/hypervisor/arch/arm64/smmu.c
>>>>>>> +++ b/hypervisor/arch/arm64/smmu.c
>>>>>>> @@ -220,6 +220,7 @@ static void arm_smmu_setup_context_bank(struct 
>>>>>>> arm_smmu_device *smmu,
>>>>>>>         mmio_write32(cb_base + ARM_SMMU_CB_TCR, VTCR_CELL & ~TCR_RES0);
>>>>>>>  
>>>>>>>         /* TTBR0 */
>>>>>>> +       /* Here */
>>>>>>>         mmio_write64(cb_base + ARM_SMMU_CB_TTBR0,
>>>>>>>                      paging_hvirt2phys(cell->arch.mm.root_table) & 
>>>>>>> TTBR_MASK);
>>>>>>
>>>>>> The issue in the BU version was solved by allocating a new page for this.
>>>>>>
>>>>>
>>>>> Only the root level? How were those entries different?
>>>>
>>>> Only the root level. IIRC, NC by default, instead of Normal.
>>>>
>>>>>> I wanted to check this effect for the version on next, but didn't find 
>>>>>> the time
>>>>>> to do it so far :/
>>>>>>
>>>>>
>>>>> How was the issue triggered?
>>>>
>>>> From the discussions I had, on the ZCU102, devices were randomly triggering
>>>> erros/ stopped working.
>>>>
>>>
>>> I just ran a enable/disable loop aside flood-ping + dd on the Ultra96-v2
>>> (I would expect it to be identical to the ZCU102 in this regard), and
>>> that did not trigger any (visible) issues yet. I'll retry with lowering
>>> the enable frequency.
>>
>> I extended the configuration of the zynqmp-zcu102 to use the SMMU and I've
>> started similar tests (enable/disable + flood ping + find /).
>>
>> With the flooding ping I can regularly trigger ethernet errors in the diable 
>> ->
>> enable interval e.g.,:
>>
>> [  373.470078] The Jailhouse was closed.
>> [  374.957052] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK
>> [  374.966376] The Jailhouse is opening.
>>
>> Maybe just outstanding transactions.
>>
>> I got once an extended error that included the SD card
>>
>> [  112.215426] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK
>> [  112.223243] mmc0: ADMA error: 0x02000000
>> ... full dump ...
>>
>> But I cannot detect from the log if it was after the disable or while 
>> jailhouse
>> was enabled.
>>
>> I didn't have time to debug much further. I want to double check (also with a
>> colleague) my current stream_id configuration because I only covered the LPD
>> masters and I want to check the other TBUs. (I'll post the configuration once
>> I've checked it.)
>>
> 
> I think I found another issue, may "The Issue" (tm): We enable the SMMU
> prior to having the guest page table created. That is only the case on
> config-commit. I'll try to move the device-reset function there and give
> that a shot soon.
> 

Yep, I have a stable enable/disable loop on my imx8qm now. Requires some
smaller refactorings. Patches will follow.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

-- 
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jailhouse-dev/03da8732-0592-cdc5-c4d2-b5edcb3cab5d%40siemens.com.

Reply via email to