Ackerley Tng <[email protected]> writes:

> Tarun Sahu <[email protected]> writes:
>
>> Add a new KVM selftest `guest_memfd_preservation_test` to verify that
>> guest memory backed by guest_memfd is preserved properly.
>>
>
> Don't think using backticks in commit messages is a common practice but
> I might be wrong here.
>
>> The test leverages the Live Update Orchestrator (LUO) infrastructure
>> to validate that memory folios and configuration layouts are
>> successfully saved and then restored during kernel live updates,
>> preventing any memory loss for the guest.
>>
>> Here, I have used the kvm selftests framework by creating a new
>> vm and mapping two memory slots to it. One is the code that is executed
>> inside the vm and other is the guest_memfd whose memory is being
>> written by the guest code.
>>
>
> Don't think commit messages with "I" are common either


Will take care of commit message everywhere as per the guidelines,
Sorry about that.
>
>> In Phase 1: Once data is written the vm exits and wait for the user
>> to trigger the kexec.
>>
>> In Phase 2: A new vm is created with retrieved kvm and again two
>> memory slots are assigned. Once for guest code, and another is for
>> retrieved guest_memfd where guest_memfd memory is verified by the
>> executed guest code. If verification succeeds, The test passes.
>>
>>
>> [...snip...]
>>
>> +#define SESSION_NAME "gmem_vm_preservation_session"
>> +#define VM_TOKEN 0x1001
>> +#define GMEM_TOKEN 0x1002
>> +
>> +#define GMEM_SIZE (16ULL * 1024 * 1024)
>> +#define DATA_SIZE (5ULL * 1024 * 1024)
>> +
>> +static size_t page_size;
>> +
>> +/* Deterministic byte pattern generation based on offset */
>> +static inline uint8_t get_pattern_byte(size_t offset)
>> +{
>> +    return (uint8_t)(offset ^ 0x5A);
>> +}
>> +
>> +static void guest_code_phase1(uint64_t gpa, uint64_t size, uint64_t 
>> data_size)
>> +{
>> +    uint8_t *mem = (uint8_t *)gpa;
>> +    size_t i;
>> +
>> +    for (i = 0; i < data_size; i++)
>> +            mem[i] = get_pattern_byte(i);
>> +
>> +    GUEST_DONE();
>> +}
>> +
>> +static void guest_code_phase2(uint64_t gpa, uint64_t size, uint64_t 
>> data_size)
>> +{
>> +    uint8_t *mem = (uint8_t *)gpa;
>> +    size_t i;
>> +
>> +    for (i = 0; i < data_size; i++) {
>> +            uint8_t val = get_pattern_byte(i);
>> +
>> +            __GUEST_ASSERT(mem[i] == val,
>> +                           "Data mismatch at offset %lu! Expected 0x%x, got 
>> 0x%x",
>> +                           i, val, mem[i]);
>> +    }
>> +
>> +    GUEST_DONE();
>> +}
>> +
>> +static void do_phase1(void)
>> +{
>> +    uint64_t flags = GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED;
>
> Is there a reason to set GUEST_MEMFD_FLAG_MMAP? We're not really
> accessing that memory from the host in this test.

Right, We can skip it.

>
>> +    int gmem_fd, dev_luo_fd, session_fd, ret;
>> +    const uint64_t gpa = SZ_4G;
>> +    struct kvm_vcpu *vcpu;
>> +    const int slot = 1;
>> +    struct kvm_vm *vm;
>> +
>> +    vm = __vm_create_shape_with_one_vcpu(VM_SHAPE_DEFAULT, &vcpu, 1,
>> +                                    guest_code_phase1);
>> +    gmem_fd = vm_create_guest_memfd(vm, GMEM_SIZE, flags);
>> +    vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, 
>> GMEM_SIZE, NULL,
>> +                             gmem_fd, 0);
>> +
>> +    for (size_t i = 0; i < GMEM_SIZE; i += page_size)
>> +            virt_pg_map(vm, gpa + i, gpa + i);
>> +
>> +    vcpu_args_set(vcpu, 3, gpa, GMEM_SIZE, DATA_SIZE);
>
> If GMEM_SIZE and DATA_SIZE are static I think we don't have to set those
> as vcpu_args_set(), they can be used as macros from within the guest.

Yes, There are multiple places we can skip it, Like passing them as the
argument in the guest_code_phase1/2. Will update it.

>
>> +
>> +    vcpu_run(vcpu);
>> +    TEST_ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_DONE);
>> +
>> +    dev_luo_fd = luo_open_device();
>> +    TEST_ASSERT(dev_luo_fd >= 0, "Failed to open /dev/liveupdate");
>> +
>> +    session_fd = luo_create_session(dev_luo_fd, SESSION_NAME);
>> +    TEST_ASSERT(session_fd >= 0, "Failed to create LUO session");
>> +
>> +    ret = luo_session_preserve_fd(session_fd, vm->fd, VM_TOKEN);
>> +    TEST_ASSERT(ret == 0, "Failed to preserve VM file descriptor");
>> +
>> +    ret = luo_session_preserve_fd(session_fd, gmem_fd, GMEM_TOKEN);
>> +    TEST_ASSERT(ret == 0, "Failed to preserve guest_memfd file descriptor");
>> +
>
> Thanks for showing how this works :)

Glad to know. it helped.
. .
 v


>
>> +    
>> printf("\n============================================================\n");
>> +    printf("Phase 1 Complete Successfully!\n");
>> +    printf("VM file and guest_memfd file have been preserved via LUO.\n");
>> +    printf("Tokens: VM_TOKEN=0x%x, GMEM_TOKEN=0x%x\n", VM_TOKEN, 
>> GMEM_TOKEN);
>> +    printf("Machine Size: %llu MB, Data Size: %llu MB\n", GMEM_SIZE / SZ_1M,
>> +                             DATA_SIZE / SZ_1M);
>> +    
>> printf("------------------------------------------------------------\n");
>> +
>> +    daemonize_and_wait();
>> +}
>> +
>> +static struct kvm_vm *vm_create_from_fd(int resurrected_vm_fd,
>> +                                    struct vm_shape shape)
>> +{
>> +    struct kvm_vm *vm;
>> +
>> +    vm = calloc(1, sizeof(*vm));
>> +    TEST_ASSERT(vm != NULL, "Insufficient Memory");
>> +
>> +    vm_init_fields(vm, shape);
>
> What would happen if the shape was changed between preserving and
> restoring?
Shape must be consistent across kexec. It may break. struct vm_shape
includes two fields
1. mode: 4k, 16k or 64k mapping, how many page table bit etc.
         which are used to setup mapping with guest code, and memory.
         guest_memfd does not support mapping other than PAGE_SIZE.
2. type: This is userspace side of the vm type. And it will not have
         information about the preserved vm_type via vm_file (kernel).
         If userspace changes this on stage2, some action might not
         work. for example, if userspace is expecting vm_type to be
         COCO VM and preserved vm_type is shared, this will have
         conflict when userspace will try to perform some operation that
         only works with COCO VM.


My Question is: Why should someone change the shape, unless they are
                planning to make the vm fail?

>
>> +
>> +    vm->kvm_fd = open_path_or_exit(KVM_DEV_PATH, O_RDWR);
>> +    vm->fd = resurrected_vm_fd;
>> +
>> +    if (kvm_has_cap(KVM_CAP_BINARY_STATS_FD))
>> +            vm->stats.fd = vm_get_stats_fd(vm);
>> +    else
>> +            vm->stats.fd = -1;
>> +
>> +    vm_init_memory_properties(vm);
>> +
>> +    return vm;
>> +}
>> +
>
> I think vm_create_from_fd() could be introduced in an earlier patch to
> reduce the amount of new code in this patch. Also, I think it could
> perhaps be moved to kvm_util.c assuming that other test will use it too.
>
Makes sense, Will take of it next revision v4.


>> +static void do_phase2(void)
>> +{
>> +    int retrieved_vm_fd, retrieved_gmem_fd, dev_luo_fd, session_fd;
>> +    struct vm_shape shape = VM_SHAPE_DEFAULT;
>> +    const uint64_t gpa = SZ_4G;
>> +    struct kvm_vcpu *vcpu;
>> +    const int slot = 1;
>> +    struct kvm_vm *vm;
>> +
>> +    dev_luo_fd = luo_open_device();
>> +    TEST_ASSERT(dev_luo_fd >= 0, "Failed to open /dev/liveupdate");
>> +
>> +    session_fd = luo_retrieve_session(dev_luo_fd, SESSION_NAME);
>> +    TEST_ASSERT(session_fd >= 0, "Failed to retrieve LUO session");
>> +
>> +    retrieved_vm_fd = luo_session_retrieve_fd(session_fd, VM_TOKEN);
>> +    TEST_ASSERT(retrieved_vm_fd >= 0, "Failed to retrieve VM file 
>> descriptor");
>> +
>> +    retrieved_gmem_fd = luo_session_retrieve_fd(session_fd, GMEM_TOKEN);
>> +    TEST_ASSERT(retrieved_gmem_fd >= 0, "Failed to retrieve guest_memfd 
>> file descriptor");
>> +
>> +    vm = vm_create_from_fd(retrieved_vm_fd, shape);
>> +
>> +    u64 nr_pages = 2048; /* 8MB is plenty for slot0 pages */
>> +
>
> I don't think declarations are usually mixed with regular code.
Okay, will udpate that.

>
>> +    vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 0, 0, nr_pages, 
>> 0);
>> +    kvm_vm_elf_load(vm, program_invocation_name);
>> +
>> +    for (int i = 0; i < NR_MEM_REGIONS; i++)
>> +            vm->memslots[i] = 0;
>> +
>> +    struct userspace_mem_region *slot0 = memslot2region(vm, 0);
>> +
>> +    ucall_init(vm, slot0->region.guest_phys_addr + 
>> slot0->region.memory_size);
>> +
>> +    vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, 
>> GMEM_SIZE, NULL,
>> +                               retrieved_gmem_fd, 0);
>> +
>> +    for (size_t i = 0; i < GMEM_SIZE; i += page_size)
>> +            virt_pg_map(vm, gpa + i, gpa + i);
>> +
>> +    vcpu = vm_vcpu_add(vm, 0, guest_code_phase2);
>> +    kvm_arch_vm_finalize_vcpus(vm);
>> +
>> +    vcpu_args_set(vcpu, 3, gpa, GMEM_SIZE, DATA_SIZE);
>> +
>> +    printf("Resuming / Running VM in Phase 2...\n");
>> +    vcpu_run(vcpu);
>> +    TEST_ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_DONE);
>> +
>> +    printf("\nSUCCESS: Phase 2 Complete! All 5MB complex data verified 
>> intact!\n");
>> +
>> +    luo_session_finish(session_fd);
>> +    close(session_fd);
>> +    close(dev_luo_fd);
>> +    /* This will also close the vm_fd */
>> +    kvm_vm_free(vm);
>> +    close(retrieved_gmem_fd);
>> +}
>> +
>> +int main(int argc, char *argv[])
>> +{
>> +    bool phase2 = false;
>> +
>> +    TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD));
>> +    page_size = getpagesize();
>> +
>> +    for (int i = 1; i < argc; i++) {
>> +            if (strcmp(argv[i], "--phase2") == 0)
>> +                    phase2 = true;
>> +    }
>> +
>
> Maybe use getopt() here?

In V3, it is update to use liveupdate library.
>
>> +    if (phase2)
>> +            do_phase2();
>> +    else
>> +            do_phase1();
>> +
>> +    return 0;
>> +}
>> --
>> 2.54.0.1032.g2f8565e1d1-goog
>
> I think we also need tests for trying to allocate while frozen, and
> conversion while frozen, and trying to preserve while preservation is
> not allowed.

Yes, We need those tests. For this series, I wanted to focus on design.
Now that we are aligned, next revision, I will send with more tests.

~Tarun

Reply via email to