Main changes from v2 [1]: - Rebase on top of KHO+LUO. Instead of being standalone solution KSTATE now relies on KHO for preserving memory.
Main changes from v1 [2]: - Get rid of abusing crashkernel and implent proper way to pass memory to new kernel - Lots of misc cleanups/refactoring. Series depenecies: - is_kho_boot() - https://lkml.kernel.org/r/cover.1755721529.git.epet...@amazon.de - LUO v3 series - https://lkml.kernel.org/r/20250807014442.3829950-1-pasha.tatas...@soleen.com GIT: git fetch https://github.com/aryabinin/linux.git kstate-v3 TODO: - KSTATE currently have only one, global stream of data. We need to add substreams (kinda like subtrees in FDT) and integrate them with LUO, so we could have per-file streams. That is planned to be fixed in v4. KSTATE (kernel state) is a mechanism to describe some part of the internal kernel state, save it into the memory preserved by KHO and restore the state after kexec in the new kernel. The end goal here is to be able to update host kernel under VMs with VFIO pass-through devices running on that host. This implies that we need svae/restore a lot different structs/state across different subsystems. The purpose of KSTATE is to provide common infrastructure for saving/restoring complex in-kernel states. Currently KHO uses FDT for that purpose, KSTATE aims to provide easier for use alternative. In this series KSTATE provides alternative to FDT usage in KHO, without replacing it completely. So both can be used and FDT user can be converted to KSTATE later if needed. As demonstration memblock's reserved tables converted from FDT to KSTATE, making the code simpler and smaller: include/linux/kstate.h | 1 mm/memblock.c | 158 2 files changed, 49 insertions(+), 110 deletions(-) The idea behind KSTATE resembles QEMU's migration framework [3], which solves quite similar problem - migrate state of VM/emulated devices across different versions of QEMU. So why not use FDT? - The main reason is FDT doesn't provide simple and convenient internal API for the drivers/subsystems to preserve internal data. E.g. lets consider we have some variable of type 'struct a' that needs to be preserved: struct a { int i; unsigned long *p_ulong; char s[10]; struct folio *folio; }; The FDT-way requires driver/subsystem to have a bunch of code dealing with FDT stuff, something like a_kho_write() { ... fdt_property(fdt, "i", &a.i, sizeof(a.i)); fdt_property(fdt, "ulong", a.p_ulong, sizeof(*a.p_ulong)); fdt_property(fdt, "s", &a.s, sizeof(a.s)); if (err) ... } a_kho_restore() { ... a.i = fdt_getprop(fdt, offset, "i", &len); if (!a.i || len != sizeof(a.i)) goto err *a.p_ulong = fdt_getprop.... } Each driver/subsystem has to solve this problem in their own way. Also if we use fdt properties for individual fields, that might be wastefull in terms of used memory, as these properties use strings as keys. While with KSTATE solves the same problem in more elegant way, with this: struct kstate_description a_state = { .name = "a_struct", .version_id = 1, .id = KSTATE_TEST_ID, .state_list = LIST_HEAD_INIT(test_state.state_list), .fields = (const struct kstate_field[]) { KSTATE_BASE_TYPE(i, struct a, int), KSTATE_BASE_TYPE(s, struct a, char [10]), KSTATE_POINTER(p_ulong, struct a), KSTATE_FOLIO(page, struct a), KSTATE_END_OF_LIST() }, }; saving: { static unsigned long ulong static struct a a_data = { .p_ulong = &ulong }; const int a_data_instance_id = 123; kstate_register(&test_state, &a_data, a_data_instance_id); } restoring: { static unsigned long ulong static struct a a_data = { .p_ulong = &ulong }; const int a_data_instance_id = 123; kstate_restore(&test_state, &a_data, a_data_instance_id); } The driver needs only to have a proper 'kstate_description' and provide some ID that uniquely identifies `a_data` among other instances of 'struct a'. Then call kstate_register() which will register a_data to be saved and KHO-finalize stage of kexec reboot. After reboot, the kstate_restore() call should restore all parts of a_data, in accordance with kstate_description. So basically 'struct kstate_description' provides instructions how to save/restore 'struct a'. So now to the part how this works. State of kernel data (usually it's some struct) is described by the 'struct kstate_description' containing the array of individual fields descpriptions - 'struct kstate_field'. Each field has set of bits in ->flags which instructs how to save/restore a certain field of the struct. E.g.: - KS_BASE_TYPE flag tells that field can be just copied by value, - KS_POINTER means that the struct member is a pointer to the actual data, so it needs to be dereference before saving/restoring data to/from kstate data steam. - KS_STRUCT - contains another struct, field->ksd must point to another 'struct kstate_dscription' - KS_CUSTOM - Some non-trivial field that requires custom kstate_field->save() ->restore() callbacks to save/restore data. - KS_ARRAY_OF_POINTER - array of pointers, the size of array determined by the field->count() callback - KS_ADDRESS - field is a pointer to either vmemmap area (struct page) or linear address. Stored as offset from the base address. - KS_END - special flag indicating the end of migration stream data. kstate_register() call accepts kstate_description along with an instance of an object and registers it in the global 'states' list. During 'finalize' phase of KHO we go through the list of 'kstate_description's and each instance of kstate_description forms the 'struct kstate_entry' which save into the kstate's data stream. The 'kstate_entry' contains information like ID of kstate_description, version of it, size of migration data and the data itself. The ->data is formed in accordance to the kstate_field's of the corresponding kstate_description. After the reboot, when the kstate_restore() called it parses KSTATE's data stream, finds the appropriate 'kstate_entry' and restores the contents of the object in accordance with kstate_description and ->fields. [1] https://lkml.kernel.org/r/20250310120318.2124-1-a...@yandex-team.com [2] https://lkml.kernel.org/r/20241002160722.20025-1-a...@yandex-team.com [3] https://www.qemu.org/docs/master/devel/migration/main.html#vmstate Andrey Ryabinin (7): kho: move fdt setup in separate helper. kho: move scratch memory in separate helper. kstate: Add KSTATE - [de]serialization framework for KHO kho: replace KHO FDT with kstate metadata kstate, test: add test module for testing kstate subsystem. mm/memblock: Use KSTATE instead of kho to preserve preserved_mem_table Documentation, kstate: Add KSTATE documentation Documentation/core-api/index.rst | 1 + Documentation/core-api/kstate.rst | 117 ++++++ MAINTAINERS | 8 + arch/x86/include/uapi/asm/setup_data.h | 4 +- arch/x86/kernel/kexec-bzimage64.c | 6 +- arch/x86/kernel/setup.c | 3 +- drivers/of/fdt.c | 6 +- include/linux/kexec.h | 2 +- include/linux/kstate.h | 235 +++++++++++ kernel/liveupdate/Kconfig | 16 + kernel/liveupdate/Makefile | 2 + kernel/liveupdate/kexec_handover.c | 95 ++++- kernel/liveupdate/kstate.c | 536 +++++++++++++++++++++++++ lib/Makefile | 2 + lib/test_kstate.c | 116 ++++++ mm/memblock.c | 158 +++----- 16 files changed, 1174 insertions(+), 133 deletions(-) create mode 100644 Documentation/core-api/kstate.rst create mode 100644 include/linux/kstate.h create mode 100644 kernel/liveupdate/kstate.c create mode 100644 lib/test_kstate.c -- 2.49.1