On 15.09.20 13:49, Oliver Schwartz wrote:
On 15 Sep 2020, at 11:00, Jan Kiszka <[email protected]
<mailto:[email protected]>> wrote:
On 15.09.20 09:07, Oliver Schwartz wrote:
I’m currently trying out the arm64-zero-exits branch and got stuck.
System is a Xilinx ZU9EG on a custom board, similar to zcu102. I’ve
brought ATF up to date and patched it with Jans patch to enable SDEI.
If I don’t enable SDEI in ATF everything works as expected (with VM
exits for interrupts, of course). Jailhouse source is the tip of
branch arm64-zero-exits.
If I enable SDEI in ATF, jailhouse works most of the time, except for
when it doesn’t. Sometimes, ‘jailhouse enable’ results in:
Initializing processors:
CPU 1... OK
CPU 0...
/home/oliver/0.12-gitAUTOINC+98061469d0-r0/git/hypervisor/arch/arm64/setup.c:73:
returning error -EIO
Weird - that the SDEI event enable call.
Yes, that’s a bit scary. The code involved in ATF is limited - I’m
pretty sure that I’m up-to-date with upstream there.
FAILED
JAILHOUSE_ENABLE: Input/output error
I’ve seen this error only when I enable jailhouse through some init
script during the boot process, when the system is also busy
otherwise. When starting jailhouse on an idle system I haven’t seen this.
Possibly a regression of my recent refactoring which I didn't manage
to test yet. Could you try if
https://github.com/siemens/jailhouse/commits/e0ef829c85895dc6387d5ea11b08aa65a456255f
was any better?
No, I don’t see any difference with that version.
Good and bad news at the same time, unfortunately ruling out a quick
solution.
Sometimes it may hang later during ‘jailhouse enable’:
Initializing processors:
CPU 1... OK
CPU 0... OK
CPU 2... OK
CPU 3... OK
Initializing unit: irqchip
Using SDEI-based management interrupt
Initializing unit: ARM SMMU v3
Initializing unit: PVU IOMMU
Initializing unit: PCI
Adding virtual PCI device 00:00.0 to cell "root"
Page pool usage after late setup: mem 67/992, remap 5/131072
Activating hypervisor
[ 5.847540] The Jailhouse is opening.
Using a JTAG debugger I see that one or more cores are stuck in
hypervisor/arch/arm-common/psci.c, line 105.
It may also succeed in stopping one or more CPUs and then hang (again
with one or more cores stuck in psci.c, line 105):
[ 5.810220] The Jailhouse is opening.
[ 5.860054] CPU1: shutdown
[ 5.862677] psci: CPU1 killed.
Has anyone else observed this? Any ideas on what may cause this? My
gut feeling is that there’s a race condition somewhere, but it feels
like looking for a needle in a haistack.
Finally, if ‘jailhouse enable’ succeeds, a subsequent ‘jailhouse
disable’ followed by another ‘jailhouse enable’ always hangs the
system (cores stuck in psci.c).
Otherwise, once ‘jailhouse enable’ succeeds the system is working
fine and I can start, stop load and unload my guest inmate at will.
To make matters a bit more complicated: My system is based on Xilinx
Petalinux 2018.2. For various reasons I’m stuck with the ATF version
that comes with Petalinux
(https://github.com/Xilinx/arm-trusted-firmware/tree/xilinx-v2018.2),
which is a bit dated. To get SDEI to work I had to backport a number
of patches from later releases. I am quite confident that SDEI and
EHF handling are now up-to-date with the master branch from Arms ATF
repository, but there remains a chance that I missed something and
the issues above result from something in ATF.
OK, obviously that different ATF is another critical point, also in
the light of SDEI_EVENT_ENABLE failing.
Sure. If you or others haven’t ever seen the above behaviour then the
issue is most likely on this side and I have to do another comparison of
my ATF sources to upstream.
Theoretically, it might also be a hidden issue in the ATF patch itself,
just exposed by your different setup.
Can't you get your board running with the upstream ATF version, just
for the Jailhouse test? Then we would know better where to dig.
That was my first approach, but I didn’t get very far. With upstream ATF
from Arm my (Xilinx enhanced) kernel doesn’t boot. Exchanging the kernel
opens another can of worms, but I’ll see what I can do.
I managed to boot with ATF from master in the Xilinx repository. I also
had to update the PMU Firmware to make this work. The resulting system
was acting strange in a number of ways. Jailhouse showed the same
occasional hangs during intial CPU shutdown, but given the overall
unstable system I abandoned any further investigations and resorted to
patching the working ATF.
OK, sounds frightening, indeed. The overall degree of adjustments you
have to apply to even get booting systems is, well, demotivating with
that platform.
Anyway, pick the most reproducible effect, probably that failing
EVENT_ENABLE, and try to debug that in depth in the hope to find a
single magic root cause. Nasty things come with multi-cause problems,
though, and I've seen too many already.
Jan
--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
--
You received this message because you are subscribed to the Google Groups
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/jailhouse-dev/33c9ad9c-41fc-2ebf-7f4d-cad19bab8135%40siemens.com.