On Thu, Sep 04, 2025 at 07:38:53PM -0700, Mukesh R wrote: > On 9/4/25 15:37, Stanislav Kinsburskii wrote: > > On Wed, Sep 03, 2025 at 07:10:16PM -0700, Mukesh Rathor wrote: > >> + > >> +/* > >> + * Common function for all cpus before devirtualization. > >> + * > >> + * Hypervisor crash: all cpus get here in nmi context. > >> + * Linux crash: the panicing cpu gets here at base level, all others in > >> nmi > >> + * context. Note, panicing cpu may not be the bsp. > >> + * > >> + * The function is not inlined so it will show on the stack. It is named > >> so > >> + * because the crash cmd looks for certain well known function names on > >> the > >> + * stack before looking into the cpu saved note in the elf section, and > >> + * that work is currently incomplete. > >> + * > >> + * Notes: > >> + * Hypervisor crash: > >> + * - the hypervisor is in a very restrictive mode at this point and any > >> + * vmexit it cannot handle would result in reboot. For example, > >> console > >> + * output from here would result in synic ipi hcall, which would > >> result > >> + * in reboot. So, no mumbo jumbo, just get to kexec as quickly as > >> possible. > >> + * > >> + * Devirtualization is supported from the bsp only. > >> + */ > >> +static noinline __noclone void crash_nmi_callback(struct pt_regs *regs) > >> +{ > >> + struct hv_input_disable_hyp_ex *input; > >> + u64 status; > >> + int msecs = 1000, ccpu = smp_processor_id(); > >> + > >> + if (ccpu == 0) { > >> + /* crash_save_cpu() will be done in the kexec path */ > >> + cpu_emergency_stop_pt(); /* disable performance trace */ > >> + atomic_inc(&crash_cpus_wait); > >> + } else { > >> + crash_save_cpu(regs, ccpu); > >> + cpu_emergency_stop_pt(); /* disable performance trace */ > >> + atomic_inc(&crash_cpus_wait); > >> + for (;;); /* cause no vmexits */ > >> + } > >> + > >> + while (atomic_read(&crash_cpus_wait) < num_online_cpus() && msecs--) > >> + mdelay(1); > >> + > >> + stop_nmi(); > >> + if (!hv_has_crashed) > >> + hv_notify_prepare_hyp(); > >> + > >> + if (crashing_cpu == -1) > >> + crashing_cpu = ccpu; /* crash cmd uses this */ > >> + > >> + hv_hvcrash_ctxt_save(); > >> + hv_mark_tss_not_busy(); > >> + hv_crash_fixup_kernpt(); > >> + > >> + input = *this_cpu_ptr(hyperv_pcpu_input_arg); > >> + memset(input, 0, sizeof(*input)); > >> + input->rip = trampoline_pa; /* PA of hv_crash_asm32 */ > >> + input->arg = devirt_cr3arg; /* PA of trampoline page table L4 */ > >> + > >> + status = hv_do_hypercall(HVCALL_DISABLE_HYP_EX, input, NULL); > >> + if (!hv_result_success(status)) { > >> + pr_emerg("%s: %s\n", __func__, hv_result_to_string(status)); > >> + pr_emerg("Hyper-V: disable hyp failed. kexec not possible\n"); > > > > These prints won't ever be printed to any console as prints in NMI > > handler are deffered. > > It's mostly for debug. There are different config options allowing one > to build kernel easily dumping to either uart, led, speaker etc... There > are no easy ways to debug. kernel debuggers could trap EMERGENCY printks > also... > > Are you 100% sure printk is async even if KERN_EMERG? If yes, I'd like to > propose someday to make it bypass all that for pr_emerg. >
Yes, I'm quite sure. Right now this looks like is dead code. > > > Also, how are they aligned with the notice in the comment on top of > > the function stating that console output would lead to synic ipi call? > > Comment says "Hypervisor Crash". Please reread the whole block. > The comment states that in case of hypervisor crash "console output from here would result in synic ipi hcall, which would result in reboot". So, why printing anything if it will simply lead to reboot? > > > > Resetting the machine from an NMI handler is sloppy. > > There could be another NMI, which triggers the panic, leading to this > > handler. > > NMI handlers servicing is batched meanining that not only this handler > > won't output anything, but also any other prints from any other handlers > > executed before the same lock won't be written out to consoles. > > > > This introduces silent machine resets for the root partition. Can the > > intrusive logic me moved to a tasklet? > > I really don't think you understand what is going on here. I've tried > telling you at least once in the past year, there is no return from the nmi > handler in case of hyp crash, and that this is panic mode, something > really bad has happened! It could be memory corruption, it could be > hw failure... The hyp goes in emergency mode that just mostly loops, > handling tiny number of hypercalls and msrs for support of dom0/root > like windows that implements custom core collection in raw mode. > I wasn't clear. I wasn't talking about a hypervisor crash. If it is so intrusive, that an attempt to print things to console may lead to reboot, then there should be no prints for this case. But this same logic is also used for Linux crashes, when prints can and should be printed to console. Moreover, whe same logic is used for a case when there is no crash kernel loaded, which as I said already leads to silent reboot if panic has happened in NMI handler. I believe this needs to be fixed. Stas