Taylor R Campbell <riastr...@netbsd.org> writes:

>> Date: Sat, 08 Jul 2023 14:34:56 -0400
>> From: Brad Spencer <b...@anduin.eldar.org>
>> 
>> Taylor R Campbell <riastr...@netbsd.org> writes:
>> 
>> > Can you either:
>> 
>> Yes, I can perform as much of this as needed after I get some other
>> stuff in life dealt with more towards the end of the month.  I really
>> won't have any time before then.
>
> No worries!

I looking at this problem again having returned from my road trip.  I
built a -current from 2023-07-20 and am working with that.

>> > 1. share the output of `vmstat -e | grep -e tsc -e systime -e
>> >    hardclock' after you get the console warning;
>> 
>> The DOMU currently only has 1 vcpu, but here is the output now:
>> 
>> vcpu0 raw systime went backwards                          46579    0 intr
>> 
>> When I have real time later I will force the negative runtime to happen
>> and run the above again.
>
> This is evidence that the hypervisor is doing something wrong with the
> clock it exposes to the guest.  However, on a single-vCPU system, we
> work around this by noting the last Xen systime recorded on the
> current vCPU, and pretending the clock just hadn't changed since then.
>
> On a multi-vCPU system, we also try to work around it by recording a
> clock skew in xen_global_systime_ns and applying it to ensure the
> timestamp is monotonic, but perhaps that's not working right -- or
> perhaps it is working for 64-bit timestamps, but the jumps are so
> large that they wrap around the 32-bit timecounter arithmetic.
>
>> > 2. run
>> >
>> >    dtrace -n 'sdt:xen:clock: { printf("%d %d %d %d %d %d %d",
>> >    arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7) }'
>
> Note: this should now be
>
> dtrace -n 'sdt:xen:clock:, sdt:xen:hardclock:, sdt:xen:timecounter: { 
> printf("%d %d %d %d %d %d %d", arg0, arg1, arg2, arg3, arg4, arg5, arg6, 
> arg7) }'

With a DOMU kernel compiled with KDTRACE_HOOKS I get the following with
either of those dtrace probes on the DOMU:

dtrace -n 'sdt:xen:clock:, sdt:xen:hardclock:, sdt:xen:timecounter: { 
printf("%d %d %d %d %d %d %d", arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7) 
}'
dtrace: invalid probe specifier sdt:xen:clock:, sdt:xen:hardclock:, 
sdt:xen:timecounter: { printf("%d %d %d %d %d %d %d", arg0, arg1, arg2, arg3, 
arg4, arg5, arg6, arg7) }: "/usr/lib/dtrace/psinfo.d", line 46: syntax error 
near "u_int"

Now, this would have been with a 9.x userland and a -current (10.99.6)
kernel and modules from 2023-07-20.  The -current dtrace binary did the
same thing, but I didn't replace the libraries.

>> >    on the system, and leave it running with output directed to a file,
>> >    and share the output when you see the console warning; or
>> 
>> The DOMU is a 9.3_STABLE from around November 8th and when I attempted
>> to run the above dtrace it didn't work.  I got this in the messsages:
>> 
>> [ 1792486.921759] kobj_checksyms, 988: [dtrace]: linker error: symbol 
>> `dtrace_invop_calltrap_addr' not found
>> [ 1792486.921759] kobj_checksyms, 988: [dtrace]: linker error: symbol 
>> `dtrace_invop_jump_addr' not found
>> [ 1792486.921759] kobj_checksyms, 988: [dtrace]: linker error: symbol 
>> `dtrace_trap_func' not found
>> [ 1792486.921759] WARNING: module error: unable to affix module `dtrace', 
>> error 8
>
> Looks like nobody has wired up dtrace to Xen!  That's a pretty serious
> regression of Xen vs native x86.  Someone needs to hook these up.

Ya, this appears to just be KDTRACE_HOOKS, as mentioned in the other
email (I should have remembered this, as I had seen it before).  That is
also needed to get the solaris module to load, which is required by the
zfs module.  The effect right now is that zfs won't work out of the box
on a DOMU.

> In the mean time, I've add a little more diagnostics to HEAD -- if you
> can boot a current kernel, that might help, or I could try to make the
> corresponding changes on netbsd-9.
>
> https://mail-index.netbsd.org/source-changes/2023/07/13/msg145973.html
> https://mail-index.netbsd.org/source-changes/2023/07/13/msg145974.html

I am current trying to abuse the system to make the negative runtime
thing happen.  Usually this has occured a day or so after building the
world for a couple of different system types.  We will see if it still
triggers.


If you think I should have to replace userland DOMU, I can probably do
that too.  Updaing the DOM0 kernel is a bit more involved but can also
happen with some planning.



-- 
Brad Spencer - b...@anduin.eldar.org - KC8VKS - http://anduin.eldar.org

Reply via email to