On Fri, Jan 30, 2026 at 08:32:45PM +0000, Anirudh Rayabharam wrote:
> On Fri, Jan 30, 2026 at 10:46:45AM -0800, Stanislav Kinsburskii wrote:
> > On Fri, Jan 30, 2026 at 05:11:12PM +0000, Anirudh Rayabharam wrote:
> > > On Wed, Jan 28, 2026 at 03:11:14PM -0800, Stanislav Kinsburskii wrote:
> > > > On Wed, Jan 28, 2026 at 04:16:31PM +0000, Anirudh Rayabharam wrote:
> > > > > On Mon, Jan 26, 2026 at 12:46:44PM -0800, Stanislav Kinsburskii wrote:
> > > > > > On Tue, Jan 27, 2026 at 12:19:24AM +0530, Anirudh Rayabharam wrote:
> > > > > > > On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii 
> > > > > > > wrote:
> > > > > > > > The MSHV driver deposits kernel-allocated pages to the 
> > > > > > > > hypervisor during
> > > > > > > > runtime and never withdraws them. This creates a fundamental 
> > > > > > > > incompatibility
> > > > > > > > with KEXEC, as these deposited pages remain unavailable to the 
> > > > > > > > new kernel
> > > > > > > > loaded via KEXEC, leading to potential system crashes upon 
> > > > > > > > kernel accessing
> > > > > > > > hypervisor deposited pages.
> > > > > > > > 
> > > > > > > > Make MSHV mutually exclusive with KEXEC until proper page 
> > > > > > > > lifecycle
> > > > > > > > management is implemented.
> > > > > > > 
> > > > > > > Someone might want to stop all guest VMs and do a kexec. Which is 
> > > > > > > valid
> > > > > > > and would work without any issue for L1VH.
> > > > > > > 
> > > > > > 
> > > > > > No, it won't work and hypervsisor depostied pages won't be 
> > > > > > withdrawn.
> > > > > 
> > > > > All pages that were deposited in the context of a guest partition 
> > > > > (i.e.
> > > > > with the guest partition ID), would be withdrawn when you kill the 
> > > > > VMs,
> > > > > right? What other deposited pages would be left?
> > > > > 
> > > > 
> > > > The driver deposits two types of pages: one for the guests (withdrawn
> > > > upon gust shutdown) and the other - for the host itself (never
> > > > withdrawn).
> > > > See hv_call_create_partition, for example: it deposits pages for the
> > > > host partition.
> > > 
> > > Hmm.. I see. Is it not possible to reclaim this memory in module_exit?
> > > Also, can't we forcefully kill all running partitions in module_exit and
> > > then reclaim memory? Would this help with kernel consistency
> > > irrespective of userspace behavior?
> > > 
> > 
> > It would, but this is sloppy and cannot be a long-term solution.
> > 
> > It is also not reliable. We have no hook to prevent kexec. So if we fail
> > to kill the guest or reclaim the memory for any reason, the new kernel
> > may still crash.
> 
> Actually guests won't be running by the time we reach our module_exit
> function during a kexec. Userspace processes would've been killed by
> then.
> 

No, they will not: "kexec -e" doesn't kill user processes.
We must not rely on OS to do graceful shutdown before doing
kexec.

> Also, why is this sloppy? Isn't this what module_exit should be
> doing anyway? If someone unloads our module we should be trying to
> clean everything up (including killing guests) and reclaim memory.
> 

Kexec does not unload modules, but it doesn't really matter even if it
would.
There are other means to plug into the reboot flow, but neither of them
is robust or reliable.

> In any case, we can BUG() out if we fail to reclaim the memory. That would
> stop the kexec.
> 

By killing the whole system? This is not a good user experience and I
don't see how can this be justified.

> This is a better solution since instead of disabling KEXEC outright: our
> driver made the best possible efforts to make kexec work.
> 

How an unrealiable feature leading to potential system crashes is better
that disabling kexec outright?

It's a complete opposite story for me: the latter provides a limited,
but robust functionality, while the former provides an unreliable and
unpredictable behavior.

> > 
> > There are two long-term solutions:
> >  1. Add a way to prevent kexec when there is shared state between the 
> > hypervisor and the kernel.
> 
> I honestly think we should focus efforts on making kexec work rather
> than finding ways to prevent it.
> 

There is no argument about it. But until we have it fixed properly, we
have two options: either disable kexec or stop claiming we have our
driver up and ready for external customers. Giving the importance of
this driver for current projects, I believe the better way would be to
explicitly limit the functionality instead of postponing the
productization of the driver.

In other words, this is not about our fillings about kexec support: it's
about what we can reliably provide to our customers today.

Thanks,
Stanislav

> Thanks,
> Anirudh
> 
> >  2. Hand the shared kernel state over to the new kernel.
> > 
> > I sent a series for the first one. The second one is not ready yet.
> > Anything else is neither robust nor reliable, so I don’t think it makes
> > sense to pursue it.
> > 
> > Thanks,
> > Stanislav
> > 
> > 
> > > Thanks,
> > > Anirudh.
> > > 
> > > > 
> > > > Thanks,
> > > > Stanislav
> > > > 
> > > > > Thanks,
> > > > > Anirudh.
> > > > > 
> > > > > > Also, kernel consisntency must no depend on use space behavior. 
> > > > > > 
> > > > > > > Also, I don't think it is reasonable at all that someone needs to
> > > > > > > disable basic kernel functionality such as kexec in order to use 
> > > > > > > our
> > > > > > > driver.
> > > > > > > 
> > > > > > 
> > > > > > It's a temporary measure until proper page lifecycle management is
> > > > > > supported in the driver.
> > > > > > Mutual exclusion of the driver and kexec is given and thus should be
> > > > > > expclitily stated in the Kconfig.
> > > > > > 
> > > > > > Thanks,
> > > > > > Stanislav
> > > > > > 
> > > > > > > Thanks,
> > > > > > > Anirudh.
> > > > > > > 
> > > > > > > > 
> > > > > > > > Signed-off-by: Stanislav Kinsburskii 
> > > > > > > > <[email protected]>
> > > > > > > > ---
> > > > > > > >  drivers/hv/Kconfig |    1 +
> > > > > > > >  1 file changed, 1 insertion(+)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> > > > > > > > index 7937ac0cbd0f..cfd4501db0fa 100644
> > > > > > > > --- a/drivers/hv/Kconfig
> > > > > > > > +++ b/drivers/hv/Kconfig
> > > > > > > > @@ -74,6 +74,7 @@ config MSHV_ROOT
> > > > > > > >         # e.g. When withdrawing memory, the hypervisor gives 
> > > > > > > > back 4k pages in
> > > > > > > >         # no particular order, making it impossible to 
> > > > > > > > reassemble larger pages
> > > > > > > >         depends on PAGE_SIZE_4KB
> > > > > > > > +       depends on !KEXEC
> > > > > > > >         select EVENTFD
> > > > > > > >         select VIRT_XFER_TO_GUEST_WORK
> > > > > > > >         select HMM_MIRROR
> > > > > > > > 
> > > > > > > > 

Reply via email to