On Wed, May 20, 2026, David Woodhouse wrote:
> On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> > Add a helper to register non-native, i.e. PV and CoCo, CPU and TSC
> > frequency calibration routines. This will allow consolidating handling
> > of common TSC properties that are forced by hypervisor (PV routines),
> > and will also allow adding sanity checks to guard against overriding a
> > TSC calibration routine with a routine that is less robust/trusted.
> >
> > Make the CPU calibration routine optional, as Xen (very sanely) doesn't
> > assume the CPU runs as the same frequency as the TSC.
> >
> > Wrap the helper in an #ifdef to document that the kernel overrides
> > the native routines when running as a VM, and to guard against unwanted
> > usage. Add a TODO to call out that AMD_MEM_ENCRYPT is a mess and doesn't
> > depend on HYPERVISOR_GUEST because it gates both guest and host code.
> >
> > No functional change intended.
> >
> > Reviewed-by: Michael Kelley <[email protected]>
> > Tested-by: Michael Kelley <[email protected]>
> > Signed-off-by: Sean Christopherson <[email protected]>
>
> Mildly concerned that we might want to support multiple options — does
> it have CPUID 0x15? Does it have 0x40000x10? Does it have a pvclock?
> There are various permutations of those which are perhaps best handled
> by *trying* each one, in some order, and populating a struct with the
> answers?
>
> But on the basis that perfect is the enemy of good,
This has been bothering me too.
Aha! AHA! Idea.
... 4 hours later ...
Mhahahaahah, victory is mine!!!!
TL;DR: Overriding x86_platform_ops hooks is dumb.
To your point about making an informed decision, that's essentialy what this
series
is already doing, just in a very roundabout way:
1. x86_platform.calibrate_{cpu,tsc}() are initialized to "native" versions
2. Hypervisor init code runs and conditionally overrides calibrate_{cpu,tsc}()
3. CoCo init code runs and conditionally overrides calibrate_{cpu,tsc}()
So the ordering you want is already there, as is "trying" each source to some
extent, in the form of steps #2 and #3 overriding the hooks if and only if their
source of information is valid. For all intents and purposes, the hardening I
was adding by formalizing the calibration overrides was to enforce the above
ordering.
But that's obviously all but impossible to follow, _and_ it's pointless.
For every PV case, including TDX and SNP, "calibration" is simply information
retrieval, i.e. it never changes (barring broken hypervisors/firmware), and the
information is always available during early boot.
Contrast that with the pre-CPUID CPU frequency calibration, where the frequency
might change, the kernel is making a best guest based on other timekeeping
sources,
and not all timekeeping sources are available during early boot.
And so overriding x86_platform.calibrate_{cpu,tsc}() for PV code is completely
unecessary, because steps #2 and #3 already know the frequency when they
override
the hooks, and "success" is guaranteed, i.e. the kernel won't have to switch to
a
"late" calibration flow.
If we provide x86_hyper_init hooks:
unsigned int (*get_tsc_khz)(void);
unsigned int (*get_cpu_khz)(void);
then we can kill off x86_platform.calibrate_{cpu,tsc}() entirely, explicitly
define the preferred ordering (user-forced => CoCo => Hypervisor => native), and
depup some of the hypervisor code.
E.g. this is what I've got for the early flow. Testing now.
void __init tsc_early_init(void)
{
unsigned int known_cpu_khz = 0, known_tsc_khz = 0;
if (!boot_cpu_has(X86_FEATURE_TSC))
return;
/* Don't change UV TSC multi-chassis synchronization */
if (is_early_uv_system())
return;
if (x86_init.hyper.get_cpu_khz)
known_cpu_khz = x86_init.hyper.get_cpu_khz();
if (tsc_early_khz)
known_tsc_khz = tsc_early_khz;
else if (cc_platform_has(CC_ATTR_GUEST_SNP_SECURE_TSC))
known_tsc_khz = snp_secure_tsc_init();
else if (boot_cpu_has(X86_FEATURE_TDX_GUEST))
known_tsc_khz = tdx_tsc_init();
/*
* If the TSC frequency is still unknown, i.e. not provided by the user
* or by trusted firmware, try to get it from the hypervisor (which is
* untrusted when running as a CoCo guest).
*/
if (!known_tsc_khz && x86_init.hyper.get_tsc_khz)
known_tsc_khz = x86_init.hyper.get_tsc_khz();
if (known_tsc_khz)
setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
if (!determine_cpu_tsc_frequencies(true, known_cpu_khz, known_tsc_khz))
return;
tsc_enable_sched_clock();
}