Hi Geoff,

On 20/10/15 00:38, Geoff Levand wrote:
> Add three new files, kexec.h, machine_kexec.c and relocate_kernel.S to the
> arm64 architecture that add support for the kexec re-boot mechanism
> (CONFIG_KEXEC) on arm64 platforms.
> 
> Signed-off-by: Geoff Levand <[email protected]>
> ---
>  arch/arm64/Kconfig                  |  10 +++
>  arch/arm64/include/asm/kexec.h      |  48 +++++++++++
>  arch/arm64/kernel/Makefile          |   2 +
>  arch/arm64/kernel/cpu-reset.S       |   2 +-
>  arch/arm64/kernel/machine_kexec.c   | 141 +++++++++++++++++++++++++++++++
>  arch/arm64/kernel/relocate_kernel.S | 163 
> ++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/kexec.h          |   1 +
>  7 files changed, 366 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm64/include/asm/kexec.h
>  create mode 100644 arch/arm64/kernel/machine_kexec.c
>  create mode 100644 arch/arm64/kernel/relocate_kernel.S
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 07d1811..73e8e31 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -491,6 +491,16 @@ config SECCOMP
>         and the task is only allowed to execute a few safe syscalls
>         defined by each seccomp mode.
>  
> +config KEXEC
> +     depends on (!SMP || PM_SLEEP_SMP)

Commit 4b3dc9679cf7 got rid of '!SMP'.


> +     select KEXEC_CORE
> +     bool "kexec system call"
> +     ---help---
> +       kexec is a system call that implements the ability to shutdown your
> +       current kernel, and to start another kernel.  It is like a reboot
> +       but it is independent of the system firmware.   And like a reboot
> +       you can start any kernel with it, not just Linux.
> +
>  config XEN_DOM0
>       def_bool y
>       depends on XEN
> diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
> index ffc9e385e..7cc7f56 100644
> --- a/arch/arm64/kernel/cpu-reset.S
> +++ b/arch/arm64/kernel/cpu-reset.S
> @@ -3,7 +3,7 @@
>   *
>   * Copyright (C) 2001 Deep Blue Solutions Ltd.
>   * Copyright (C) 2012 ARM Ltd.
> - * Copyright (C) 2015 Huawei Futurewei Technologies.
> + * Copyright (C) Huawei Futurewei Technologies.

Move this hunk into the patch that adds the file?


>   *
>   * This program is free software; you can redistribute it and/or modify
>   * it under the terms of the GNU General Public License version 2 as
> diff --git a/arch/arm64/kernel/relocate_kernel.S 
> b/arch/arm64/kernel/relocate_kernel.S
> new file mode 100644
> index 0000000..7b07a16
> --- /dev/null
> +++ b/arch/arm64/kernel/relocate_kernel.S
> @@ -0,0 +1,163 @@
> +/*
> + * kexec for arm64
> + *
> + * Copyright (C) Linaro.
> + * Copyright (C) Huawei Futurewei Technologies.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/kexec.h>
> +
> +#include <asm/assembler.h>
> +#include <asm/kexec.h>
> +#include <asm/memory.h>
> +#include <asm/page.h>
> +
> +
> +/*
> + * arm64_relocate_new_kernel - Put a 2nd stage kernel image in place and 
> boot it.
> + *
> + * The memory that the old kernel occupies may be overwritten when coping the
> + * new image to its final location.  To assure that the
> + * arm64_relocate_new_kernel routine which does that copy is not overwritten,
> + * all code and data needed by arm64_relocate_new_kernel must be between the
> + * symbols arm64_relocate_new_kernel and arm64_relocate_new_kernel_end.  The
> + * machine_kexec() routine will copy arm64_relocate_new_kernel to the kexec
> + * control_code_page, a special page which has been set up to be preserved
> + * during the copy operation.
> + */
> +.globl arm64_relocate_new_kernel
> +arm64_relocate_new_kernel:
> +
> +     /* Setup the list loop variables. */
> +     ldr     x18, .Lkimage_head              /* x18 = list entry */
> +     dcache_line_size x17, x0                /* x17 = dcache line size */
> +     mov     x16, xzr                        /* x16 = segment start */
> +     mov     x15, xzr                        /* x15 = entry ptr */
> +     mov     x14, xzr                        /* x14 = copy dest */
> +
> +     /* Check if the new image needs relocation. */
> +     cbz     x18, .Ldone
> +     tbnz    x18, IND_DONE_BIT, .Ldone
> +
> +.Lloop:
> +     and     x13, x18, PAGE_MASK             /* x13 = addr */
> +
> +     /* Test the entry flags. */
> +.Ltest_source:
> +     tbz     x18, IND_SOURCE_BIT, .Ltest_indirection
> +
> +     mov x20, x14                            /*  x20 = copy dest */
> +     mov x21, x13                            /*  x21 = copy src */
> +
> +     /* Invalidate dest page to PoC. */
> +     mov     x0, x20
> +     add     x19, x0, #PAGE_SIZE
> +     sub     x1, x17, #1
> +     bic     x0, x0, x1
> +1:   dc      ivac, x0
> +     add     x0, x0, x17
> +     cmp     x0, x19
> +     b.lo    1b
> +     dsb     sy

If I've followed all this through properly:

With KVM - mmu+caches are configured, but then disabled by 'kvm: allows kvm
cpu hotplug'. This 'arm64_relocate_new_kernel' function then runs at EL2
with M=0, C=0, I=0.

Without KVM - when there is no user of EL2, the mmu+caches are left in
whatever state the bootloader (or efi stub) left them in. From
Documentation/arm64/booting.txt:
> Instruction cache may be on or off.
and
> System caches which respect the architected cache maintenance by VA
> operations must be configured and may be enabled.

So 'arm64_relocate_new_kernel' function could run at EL2 with M=0, C=?, I=?.

I think this means you can't guarantee anything you are copying below
actually makes it through the caches - booting secondary processors may get
stale values.

The EFI stub disables the M and C bits when booted at EL2 with uefi - but
it leaves the instruction cache enabled. You only clean the
reboot_code_buffer from the data cache, so there may be stale values in the
instruction cache.

I think you need to disable the i-cache at EL1. If you jump to EL2, I think
you need to disable the I/C bits there too - as you can't rely on the code
in 'kvm: allows kvm cpu hotplug' to do this in a non-kvm case.


> +
> +     /* Copy page. */
> +1:   ldp     x22, x23, [x21]
> +     ldp     x24, x25, [x21, #16]
> +     ldp     x26, x27, [x21, #32]
> +     ldp     x28, x29, [x21, #48]
> +     add     x21, x21, #64
> +     stnp    x22, x23, [x20]
> +     stnp    x24, x25, [x20, #16]
> +     stnp    x26, x27, [x20, #32]
> +     stnp    x28, x29, [x20, #48]
> +     add     x20, x20, #64
> +     tst     x21, #(PAGE_SIZE - 1)
> +     b.ne    1b
> +
> +     /* dest += PAGE_SIZE */
> +     add     x14, x14, PAGE_SIZE
> +     b       .Lnext
> +
> +.Ltest_indirection:
> +     tbz     x18, IND_INDIRECTION_BIT, .Ltest_destination
> +
> +     /* ptr = addr */
> +     mov     x15, x13
> +     b       .Lnext
> +
> +.Ltest_destination:
> +     tbz     x18, IND_DESTINATION_BIT, .Lnext
> +
> +     mov     x16, x13
> +
> +     /* dest = addr */
> +     mov     x14, x13
> +
> +.Lnext:
> +     /* entry = *ptr++ */
> +     ldr     x18, [x15], #8
> +
> +     /* while (!(entry & DONE)) */
> +     tbz     x18, IND_DONE_BIT, .Lloop
> +
> +.Ldone:
> +     dsb     sy
> +     isb
> +     ic      ialluis
> +     dsb     sy

Why the second dsb?


> +     isb
> +
> +     /* Start new image. */
> +     ldr     x4, .Lkimage_start
> +     mov     x0, xzr
> +     mov     x1, xzr
> +     mov     x2, xzr
> +     mov     x3, xzr

Once the kexec'd kernel is booting, I get:
> WARNING: x1-x3 nonzero in violation of boot protocol:
>         x1: 0000000080008000
>         x2: 0000000000000020
>         x3: 0000000000000020
> This indicates a broken bootloader or old kernel

Presumably this 'kimage_start' isn't pointing to the new kernel, but the
purgatory code, (which comes from user-space?). (If so what are these xzr-s
for?)


> +     br      x4
> +
> +.align 3     /* To keep the 64-bit values below naturally aligned. */
> +
> +/* The machine_kexec routine sets these variables via offsets from
> + * arm64_relocate_new_kernel.
> + */
> +
> +/*
> + * .Lkimage_start - Copy of image->start, the entry point of the new
> + * image.
> + */
> +.Lkimage_start:
> +     .quad   0x0
> +
> +/*
> + * .Lkimage_head - Copy of image->head, the list of kimage entries.
> + */
> +.Lkimage_head:
> +     .quad   0x0
> +

I assume these .quad-s are used because you can't pass the values in via
registers - due to the complicated soft_restart(). Given you are the only
user, couldn't you simplify it to do all the disabling in
arm64_relocate_new_kernel?


> +.Lcopy_end:
> +.org KEXEC_CONTROL_PAGE_SIZE
> +
> +/*
> + * arm64_relocate_new_kernel_size - Number of bytes to copy to the 
> control_code_page.
> + */
> +.globl arm64_relocate_new_kernel_size
> +arm64_relocate_new_kernel_size:
> +     .quad   .Lcopy_end - arm64_relocate_new_kernel
> +
> +/*
> + * arm64_kexec_kimage_start_offset - Offset for writing .Lkimage_start.
> + */
> +.globl arm64_kexec_kimage_start_offset
> +arm64_kexec_kimage_start_offset:
> +     .quad   .Lkimage_start - arm64_relocate_new_kernel
> +
> +/*
> + * arm64_kexec_kimage_head_offset - Offset for writing .Lkimage_head.
> + */
> +.globl arm64_kexec_kimage_head_offset
> +arm64_kexec_kimage_head_offset:
> +     .quad   .Lkimage_head - arm64_relocate_new_kernel


>From 'kexec -e' to the first messages from the new kernel takes ~1 minute
on Juno, Did you see a similar delay? Or should I go looking for what I've
configured wrong!?

(Copying code with the mmu+caches on, then cleaning to PoC was noticeably
faster for hibernate)


I've used this series for kexec-ing between 4K and 64K page_size kernels on
Juno.

Tested-By: James Morse <[email protected]>



Thanks!

James






_______________________________________________
kexec mailing list
[email protected]
http://lists.infradead.org/mailman/listinfo/kexec

Reply via email to