On Mon, Jun 22, 2020 at 05:07:36AM +0000, Koakuma wrote:
> I've tried your fix on tech@, it seems to solve the problem - at least up
> until
> writing it to SP storage and changing the configuration. When I rebooted,
> though, it panics with:
That is still on 6.7-stable, right?
Is the machine's firmware up to date? Recent firmware on my T4-2 looks
like this:
-> show /HOST hypervisor_version obp_version sysfw_version
/HOST
Properties:
hypervisor_version = Hypervisor 1.15.16 2018/11/28 07:41
obp_version = OpenBoot 4.38.16 2018/11/28 07:24
sysfw_version = Sun System Firmware 8.9.11 2018/11/28 07:59
I cannot downgrade my only T4-2 in production to test this, but using a
snapshot with my ldomctl diff from tech@ I did the following to recreate
its existing configuration and reboot with it:
# sysctl -n kern.version
OpenBSD 6.7-beta (GENERIC.MP) #299: Wed Apr 29 17:49:31 MDT 2020
[email protected]:/usr/src/sys/arch/sparc64/compile/GENERIC.MP
ls factory-default/
hv.md pri primary.md
# cp -R factory-default production-new
# cd production-new
# cp ../production/ldom.conf .
# ldomctl init-system ldom.conf
# cd ..
# ldomctl download production-new
# ldomctl select production-new
# halt
-> reset /SYS
All domains come up, ldomd runs on the primary just fine and assigned
PCIe devices keep attaching in guest domains just like before.
Did you assign any PCIe devices to guest domains with `iodevice'?
Did you start with a fresh copy of the factory-default dump?
Please show the exact ldom.conf file you used.
> panic: rw_enter: vmmaplk locking against myself
>
> Going back to factory-default config seems to make it not panic anymore,
> though.
> I've attached the ddb log and relevant objdump parts too.
No clue yet, sorry.
> Uh, I am sorry, I forgot to include the ldomd backtrace.
> Recompiling it with DEBUG="-g3 -O0" and running it under gdb gave me
> this backtrace:
>
> #0 *_libc_exit (status=1) at /usr/src/lib/libc/stdlib/exit.c:54
> #1 0x000000a2c4acdb24 in *_libc_verr (eval=1, fmt=0x0,
> ap=0xfffffffffffbd0d0) at /usr/src/lib/libc/gen/verr.c:50
> #2 0x000000a2c4b23050 in *_libc_err (eval=1, fmt=0x0) at
> /usr/src/lib/libc/gen/err.c:40
> #3 0x000000a008607450 in xmalloc (size=24) at
> /usr/src/usr.sbin/ldomd/../ldomctl/util.c:36
> #4 0x000000a0086018e8 in add_frag (base=44040192) at
> /usr/src/usr.sbin/ldomd/ldomd.c:348
> #5 0x000000a0086017c0 in add_frag_mblock (node=0xa21182d7c0) at
> /usr/src/usr.sbin/ldomd/ldomd.c:333
> #6 0x000000a0086016e8 in frag_init () at /usr/src/usr.sbin/ldomd/ldomd.c:320
> #7 0x000000a008600f04 in main (argc=0, argv=0xfffffffffffbda78) at
> /usr/src/usr.sbin/ldomd/ldomd.c:192
Much better, thanks.
> Is there anything else that I could do to help?
Install the sysutils/mdprint package show the output of
`mdprint -q -g frag_space,frag_mblock hv.md', this will show us the
amount of memory ldomd is trying to allocate (in multiple fragments like
size=24 as seen in the backtrace).
Again: I never saw this on any machine, my T4-2 just works with it.
Perhaps you have a bogus machine description? I vaguely remember
`ldomctl init-system' behaving differently/crashing when *.md files
existed already, so best to always run it with nothing but a clean copy
of the "factory-default" dump and a ldom.conf text file.
Can you try to reproduce this on recent snapshots?