Hello Lior,

On 03.08.23 13:17, Lior Weintraub wrote:
> Hi Ahmad,
> 
> Hope you had a great time on EOSS 2023 :-)

Thanks and sorry for the late answer.

> Quick recap and additional info on the current issue:
> 
> 1. 
> The spider-soc QEMU with the additional GICv3 and Timers was tested with a 
> bare-metal code and proved to be OK.
> This bare-metal code sets the A53 timers and GICv3 to handle interrupts on 
> various execution levels as well as various security levels:
> EL1_NS_PHYSICAL_TIMER set as GROUP1_NON_SECURE
> EL1_SCR_PHYSICAL_TIMER set as GROUP1_SECURE
> EL2_PHYSICAL_TIMER set as GROUP1_SECURE
> VIRTUAL_TIMER set as GROUP1_NON_SECURE

ok.

> 2.
> The kernel we build with Buildroot runs OK on virt QEMU but gets stuck in the 
> middle when we use our spider-soc QEMU.
> There are few differences between those runs:
> a.
> The virt QEMU is executed with -kernel switch and hence the QEMU itself 
> implements the "bootloader" and prepares the DT given to the Kernel.
> When the Kernel starts on this platforms it starts at EL1.

This can be influenced e.g. on Virt with -M virt,virtualization=on, I think.

> b.
> The spider-soc QEMU is executed with -device loader,file=spider-soc-bl1.elf
> Just for easy execution and testing, this executable includes all the needed 
> binaries (as const data blobs) and it copies the binaries into correct 
> locations before jumping to Barebox execution.
> The list of binaries includes the barebox, kernel, dt, and rootfs.
> As you recall, BL31 is compiled via Trusted-Firmware-A and has all it's 
> functions as empty stubs because we currently don't care about CPU power 
> states.
> The prove that BL31 is executed correctly is that Barebox now runs at EL2.

Good.

> At that point the Linux kernel is starting and as I mentioned gets stuck in 
> the middle (cpu_do_idle function. more details to follow).
> 
> Debugging the kernel with GDB revealed few differences:
> 1. When running with Barebox, the kernel starts at EL2 and at some point 
> moves to EL1.
> Not sure if that has some impact on the following issue but thought it is 
> worth mentioning.
> (We get a "CPU: All CPU(s) started at EL2" trace)

I get the same on an i.MX8M as well (multi-core Cortex-A53 SoC).

> Another difference that might be related to this execution level is that 
> timers setting shows that it uses the physical timer (as oppose to virt QEMU 
> run that uses the virtual timer):
> The spider-soc QEMU Timers dump:
> CNTFRQ_EL0 = 0x3b9aca0
> CNTP_CTL_EL0 = 0x5
> CNTV_CTL_EL0 = 0x0
> CNTP_TVAL_EL0 = 0xff1f2ad5
> CNTP_CVAL_EL0 = 0xac5c3240
> CNTV_TVAL_EL0 = 0x52c2d916
> CNTV_CVAL_EL0 = 0x0
> 
> The virt QEMU Timers dump:
> CNTFRQ_EL0 = 0x3b9aca0
> CNTP_CTL_EL0 = 0x0
> CNTV_CTL_EL0 = 0x5
> CNTP_TVAL_EL0 = 0xb8394fbc
> CNTP_CVAL_EL0 = 0x0
> CNTV_TVAL_EL0 = 0xffd18e39
> CNTV_CVAL_EL0 = 0x479858aa
> 
> 2. When running with Barebox, the kernel fails to correctly set the GICv3 
> registers.
> So in other words, there are no timer events and hence the scheduler is not 
> running.
> The code get stuck on cpu_do_idle but we also found that the RCU cb_list is 
> not empty (probably explains why scheduler haven't started (just a guess)).
> We placed a breakpoint just before calling wait_for_completion (from function 
> rcu_barrier on kernel/rcu/tree.c) and found:
> bt
> #0  rcu_barrier () at kernel/rcu/tree.c:4064
> #1  0xffffffc08059e1b4 in mark_readonly () at init/main.c:1789
> #2  kernel_init (unused=<optimized out>) at init/main.c:1838
> #3  0xffffffc080015e48 in ret_from_fork () at arch/arm64/kernel/entry.S:853
> 
> At that point rcu_state.barrier_cpu_count.counter is 1 (as oppose to virt 
> QEMU where it is 0 at that point)
> If we place the breakpoint a bit earlier in this rcu_barrier function (just 
> before the for_each_possible_cpu loop) and run few more steps (to get the 
> rdp) we see that rdp->cblist.len is 0x268 (616):
> p/x rdp->cblist
> $1 = {head = 0xffffffc0808f06d0, tails = {0xffffff802fe55a78, 
> 0xffffff802fe55a78, 0xffffff802fe55a78, 0xffffff80001c22c8}, gp_seq = {0x0, 
> 0x0, 0x0, 0x0}, len = 0x268, seglen = {0x0, 0x0, 0x0, 0x268}, flags = 0x1}
> 
> When we compare that with virt QEMU we see that the rdp->cblist.len is 0 
> there.
> 
> IMHO, this all is a result of the GICv3 settings that were not applied 
> properly.
> As a result there are no timer interrupts.
> 
> Further debugging on the GICv3 settings showed that the code (function 
> gic_cpu_init on drivers/irqchip/irq-gic-v3.c) tries to write 0xffffffff to 
> GICR_IGROUPR0 (Configure SGIs/PPIs as non-secure Group-1) but when we try to 
> read it back we get all zeros.
> Dumping GICv3 settings after the call to init_IRQ:
> Showing only the differences:
>                       Spider-SoC QEMU virt QEMU
> GICD_CTLR =           0x00000012              0x00000053
> GICD_TYPER =          0x037a0402              0x037a0007
> GICR0_IGROUPR0 =      0x00000000              0xffffffff
> GICR0_ISENABLER0 =    0x00000000              0x0000007f
> GICR0_ICENABLER0 =    0x00000000              0x0000007f
> GICR0_ICFGR0 =        0x00000000              0xaaaaaaaa
> 
> Any thoughts?
> As always, your support is much appreciated!

Sorry to disappoint, but I have no hands-on experience with the GIC.
My guess would be that you are missing initialization in the TF-A...

Cheers,
Ahmad

> 
> Cheers,
> Lior. 
> 
> 
>> -----Original Message-----
>> From: Ahmad Fatoum <[email protected]>
>> Sent: Friday, June 30, 2023 8:53 AM
>> To: Lior Weintraub <[email protected]>; Ahmad Fatoum <[email protected]>;
>> [email protected]
>> Subject: Re: [PATCH v2] Porting barebox to a new SoC
>>
>> CAUTION: External Sender
>>
>> Hi Lior,
>>
>> On 25.06.23 22:33, Lior Weintraub wrote:
>>> Hello Ahmad,
>>
>> [Sorry for the delay, we're at EOSS 2023 currently]
>>
>>> I failed to reproduce this issue on virt because the addresses and 
>>> peripherals
>> on virt machine are different and it is difficult to change our code to match
>> that.
>>> If you think this is critical I will make extra effort to make it work.
>>> AFAIU, this suggestion was made to debug the "conflict" issue.
>>
>> It's not critical, but I'd have liked to understand this, so I can check
>> if it's perhaps a barebox bug.
>>
>>> Currently the workaround I am using is just to set the size of the kernel
>> partition to match the exact size of the "Image" file.
>>>
>>> The other issue I am facing is that Kernel seems stuck on cpu_do_idle and
>> there is no login prompt from the kernel.
>>
>> Does it call into PSCI during idle?
>>
>>> As you recall, I am running on a custom QEMU that tries to emulate our
>> platform.
>>> I suspect that I did something wrong with the GICv3 and Timers connectivity.
>>> The code I used was based on examples I saw on sbsa-ref.c and virt.c.
>>> In addition, I declared the GICv3 and timers on our device tree.
>>>
>>> I running QEMU with "-d int" so I am also getting trace of exceptions and
>> interrupts.
>>
>> Nice. Didn't know about this option.
>>
>> [snip]
>>
>>> Exception return from AArch64 EL3 to AArch64 EL1 PC 0xffffffc00802112c
>>> Taking exception 13 [Secure Monitor Call] on CPU 0
>>> ...from EL1 to EL3
>>> ...with ESR 0x17/0x5e000000
>>> ...with ELR 0xffffffc008021640
>>> ...to EL3 PC 0x10005400 PSTATE 0x3cd
>>> Exception return from AArch64 EL3 to AArch64 EL1 PC 0xffffffc008021640
>>
>> Looks fine so far? Doesn't look like it's hanging in EL1.
>>
>> [snip]
>>
>>> Segment Routing with IPv6
>>> In-situ OAM (IOAM) with IPv6
>>> sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
>>> NET: Registered PF_PACKET protocol family
>>> NET: Registered PF_KEY protocol family
>>> NET: Registered PF_VSOCK protocol family
>>> registered taskstats version 1
>>> clk: Disabling unused clocks
>>> Freeing unused kernel memory: 1664K
>>
>> Not sure. Normally, I'd try again with pd_ignore_unused clk_ignore_unused in
>> the
>> kernel arguments, but I think you define no clocks or power domains yet in
>> the DT?
>>
>> You can try again with kernel command line option initcall_debug and see
>> what the
>> initcall is that is getting stuck. If nothing helps, maybe attach a hardware
>> debugger?
>>
>> Cheers,
>> Ahmad
>>
>> --
>> Pengutronix e.K.                           |                             |
>> Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
>> 31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
>> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
> 

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |


Reply via email to