On Tue, Nov 26, 2013 at 01:57:51PM +0800, Dave Young wrote:
> kexec kernel will need exactly same mapping for
> efi runtime memory ranges. Thus here export the
> runtime ranges mapping to sysfs, kexec-tools
> will assemble them and pass to 2nd kernel via
> setup_data.
> 
> Introducing a new directly /sys/firmware/efi/runtime-map
> Just like /sys/firmware/memmap. Containing below attribute
> in each file of that directory:
> attribute  num_pages  phys_addr  type  virt_addr
> 
> It will not work for efi 32bit. Only x86_64 currently.
> 
> Matt: s/efi-runtime-map.c/runtime-map.c
>       change dir name to runtime-map
> update to use desc_size in efi_runtime_map
> cleaup the code, add function efi_save_runtime_map
> improve err handling
> 
> Signed-off-by: Dave Young <dyo...@redhat.com>
> ---
>  .../ABI/testing/sysfs-firmware-efi-runtime-map     |  45 +++++
>  arch/x86/platform/efi/efi.c                        |  26 +++
>  drivers/firmware/efi/Kconfig                       |  10 ++
>  drivers/firmware/efi/Makefile                      |   1 +
>  drivers/firmware/efi/efi.c                         |   3 +-
>  drivers/firmware/efi/runtime-map.c                 | 199 
> +++++++++++++++++++++
>  include/linux/efi.h                                |   6 +
>  7 files changed, 289 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-firmware-efi-runtime-map
>  create mode 100644 drivers/firmware/efi/runtime-map.c
> 
> diff --git a/Documentation/ABI/testing/sysfs-firmware-efi-runtime-map 
> b/Documentation/ABI/testing/sysfs-firmware-efi-runtime-map
> new file mode 100644
> index 0000000..dab8d41
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-firmware-efi-runtime-map
> @@ -0,0 +1,45 @@
> +What:                /sys/firmware/efi/runtime-map/
> +Date:                Oct 2013
> +Contact:     Dave Young <dyo...@redhat.com>
> +Description:
> +             Switching efi runtime services to virtual mode requires
> +             that all efi memory ranges which has the runtime attribute

s/has/have/

> +             bit set to be mapped to virtual addresses.
> +
> +             In kexec kernel kernel can not entering virtual mode again
> +             because there's a limitation that SetVirtualAddressMap can
> +             only be called once for entering virtual mode. But kexec
> +             kernel still need maintain same physical address to virtual
> +             address mapping as the 1st kernel.

Simply say

"SetVirtualAddressMap has the limitation that it can be called only once
per system lifetime so a second kernel, like kexec, needs to reuse the
exact same virtual addresses like the first kernel."


>               The mappings are exported
> +             to sysfs so userspace tools can reassemble them and pass them
> +             into kexec kernel.
> +
> +             /sys/firmware/efi/runtim-map/ is what kernel export for
> +             this purpose. The structure is as follows:

"/sys/firmware/efi/runtime-map/ is the directory the kernel exports that
information in."

> +
> +             subdirectories are named with the number of the memory range:
> +
> +                     /sys/firmware/efi/runtime-map/0
> +                     /sys/firmware/efi/runtime-map/1
> +                     /sys/firmware/efi/runtime-map/2
> +                     /sys/firmware/efi/runtime-map/3
> +                     ...
> +
> +             Each subdirectory contains five files:
> +
> +             attribute : The attribute of the memory range.

                                attributes

> +             num_pages : The size of the memory range in page number.

                                                         in 4K pages.

> +             phys_addr : The start physical address of the memory range.

                            The physical address of the memory range

> +             type      : The type of the memory range.
> +             virt_addr : The start virtual address of the memory range.

                            The virtual address...

> +
> +             Above values are all hexadecimal number with the '0x' prefix.

                                                 numbers

> +
> +             So, for example:
> +
> +                     /sys/firmware/efi/runtime-map/0/attribute
> +                     /sys/firmware/efi/runtime-map/0/num_pages
> +                     /sys/firmware/efi/runtime-map/0/phys_addr
> +                     /sys/firmware/efi/runtime-map/0/type
> +                     /sys/firmware/efi/runtime-map/0/virt_addr
> +                        ...

What is this example supposed to show? The above is already detailed
enough.

> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 2a292aa..c3a2aaa 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -76,6 +76,9 @@ static __initdata efi_config_table_type_t arch_tables[] = {
>       {NULL_GUID, NULL, NULL},
>  };
>  
> +void *efi_runtime_map;
> +int nr_efi_runtime_map;
> +
>  /*
>   * Returns 1 if 'facility' is enabled, 0 otherwise.
>   */
> @@ -811,6 +814,21 @@ static void __init efi_merge_regions(void)
>       }
>  }
>  
> +static int __init efi_save_runtime_map(efi_memory_desc_t *md)

It is a static function so no need for the "efi_" prefix.

Also, it should be called save_runtime_mem_desc or so because this is
what it does.

> +{
> +     efi_runtime_map = krealloc(efi_runtime_map,
> +                     (nr_efi_runtime_map + 1) *
> +                     memmap.desc_size, GFP_KERNEL);

WARNING: Reusing the krealloc arg is almost always a bug
#21: FILE: arch/x86/platform/efi/efi.c:819:
+       efi_runtime_map = krealloc(efi_runtime_map,

Please run your diffs through checkpatch.pl --strict before submitting.

Btw, we use krealloc below again which should be changed too.

> +     if (!efi_runtime_map)
> +             return -ENOMEM;
> +
> +     memcpy(efi_runtime_map + nr_efi_runtime_map * memmap.desc_size,
> +             md, memmap.desc_size);
> +     nr_efi_runtime_map++;
> +
> +     return 0;
> +}
> +
>  /*
>   * Map efi memory ranges for runtime serivce and update new_memmap with 
> virtual
>   * addresses.
> @@ -821,6 +839,7 @@ static void * __init efi_map_regions(int *count)
>       void *p, *new_memmap = NULL;
>       unsigned long size;
>       u64 end, systab;
> +     int error = 0;
>  
>       for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
>               md = p;
> @@ -852,9 +871,16 @@ static void * __init efi_map_regions(int *count)
>  
>               memcpy(new_memmap + (*count * memmap.desc_size), md,
>                      memmap.desc_size);
> +             if (!error && md->type != EFI_BOOT_SERVICES_CODE &&
> +                             md->type != EFI_BOOT_SERVICES_DATA)
> +                     error = efi_save_runtime_map(md);

I'm not sure I like this entangled stuff with the error code - I can
barely understand what's going on there - not necessarily because of
your changes but because this code is not that trivial, especially after
time has passed.

And I don't see you kfreeing efi_runtime_map in the case where the first
krealloc() fails at some iteration.

What would be much cleaner IMO would be if you let efi_map_regions()
finish, *and*, *only* if it was successful iterate again over the
regions and do efi_save_runtime_map() on the runtime ones. This will
make the code much more readable too.

>               (*count)++;
>       }
>  
> +     if (error) {
> +             nr_efi_runtime_map = 0;
> +             kfree(efi_runtime_map);
> +     }

This error gets overwritten each time the loop runs and you possibly
lose !0 values. What I'm suggesting above would take care of that case
too.

>  ret:
>       return new_memmap;
>  }
> diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig
> index b0fc7c7..b4d01ad 100644
> --- a/drivers/firmware/efi/Kconfig
> +++ b/drivers/firmware/efi/Kconfig
> @@ -36,4 +36,14 @@ config EFI_VARS_PSTORE_DEFAULT_DISABLE
>         backend for pstore by default. This setting can be overridden
>         using the efivars module's pstore_disable parameter.
>  
> +config EFI_RUNTIME_MAP
> +     bool "Export efi runtime maps to sysfs" if EXPERT
> +     default X86 && EFI

This should depend on KEXEC.

> +     help
> +       Export efi runtime memory maps to /sys/firmware/efi/runtime-map.
> +       That memory map is used for example by kexec to set up efi virtual
> +       mapping the 2nd kernel, but can also be used for debugging purposes.
> +
> +       See also Documentation/ABI/testing/sysfs-firmware-efi-runtime-map.
> +
>  endmenu
> diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile
> index 99245ab..ecadcc1 100644
> --- a/drivers/firmware/efi/Makefile
> +++ b/drivers/firmware/efi/Makefile
> @@ -4,3 +4,4 @@
>  obj-y                                        += efi.o vars.o
>  obj-$(CONFIG_EFI_VARS)                       += efivars.o
>  obj-$(CONFIG_EFI_VARS_PSTORE)                += efi-pstore.o
> +obj-$(CONFIG_EFI_RUNTIME_MAP)                += runtime-map.o

This will conflict with upstream:

checking file drivers/firmware/efi/Makefile
Hunk #1 FAILED at 4.
1 out of 1 hunk FAILED

It should be:

diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile
index 9ba156d3c775..7f05c368b868 100644
--- a/drivers/firmware/efi/Makefile
+++ b/drivers/firmware/efi/Makefile
@@ -5,3 +5,4 @@ obj-y                                   += efi.o vars.o
 obj-$(CONFIG_EFI_VARS)                 += efivars.o
 obj-$(CONFIG_EFI_VARS_PSTORE)          += efi-pstore.o
 obj-$(CONFIG_UEFI_CPER)                        += cper.o
+obj-$(CONFIG_EFI_RUNTIME_MAP)          += runtime-map.o


> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> index 5d85956..a480de8 100644
> --- a/drivers/firmware/efi/efi.c
> +++ b/drivers/firmware/efi/efi.c
> @@ -38,7 +38,8 @@ struct efi __read_mostly efi = {
>  };
>  EXPORT_SYMBOL(efi);
>  
> -static struct kobject *efi_kobj;
> +struct kobject *efi_kobj;
> +EXPORT_SYMBOL_GPL(efi_kobj);

Can we move this private to linux/drivers/efi/runtime-map.c? If yes,
then there's no need to export it.

>  static struct kobject *efivars_kobj;
>  
>  /*
> diff --git a/drivers/firmware/efi/runtime-map.c 
> b/drivers/firmware/efi/runtime-map.c
> new file mode 100644
> index 0000000..036d099
> --- /dev/null
> +++ b/drivers/firmware/efi/runtime-map.c
> @@ -0,0 +1,199 @@
> +/*
> + * linux/drivers/efi/runtime-map.c
> + * Copyright (C) 2013 Red Hat, Inc., Dave Young <dyo...@redhat.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License v2.0 as published by
> + * the Free Software Foundation
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.

Please drop this boilerplate text and do a single sentence saying that
it is licensed under GPL v2 and refer to COPYING in the kernel repo.

> + *
> + */
> +
> +#include <linux/string.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/efi.h>
> +#include <linux/slab.h>
> +
> +#include <asm/setup.h>
> +
> +struct efi_runtime_map_entry {
> +     efi_memory_desc_t md;
> +     struct kobject kobj;   /* kobject for each entry */
> +};
> +
> +static struct efi_runtime_map_entry **map_entries;
> +
> +static ssize_t map_attr_show(struct kobject *kobj,
> +                             struct attribute *attr, char *buf);
> +static ssize_t type_show(struct efi_runtime_map_entry *entry, char *buf);
> +static ssize_t phys_addr_show(struct efi_runtime_map_entry *entry, char 
> *buf);
> +static ssize_t virt_addr_show(struct efi_runtime_map_entry *entry, char 
> *buf);
> +static ssize_t num_pages_show(struct efi_runtime_map_entry *entry, char 
> *buf);
> +static ssize_t attribute_show(struct efi_runtime_map_entry *entry, char 
> *buf);
> +
> +struct map_attribute {
> +     struct attribute attr;
> +     ssize_t (*show)(struct efi_runtime_map_entry *entry, char *buf);
> +};
> +
> +static struct map_attribute map_type_attr = __ATTR_RO(type);
> +static struct map_attribute map_phys_addr_attr   = __ATTR_RO(phys_addr);
> +static struct map_attribute map_virt_addr_attr  = __ATTR_RO(virt_addr);
> +static struct map_attribute map_num_pages_attr  = __ATTR_RO(num_pages);
> +static struct map_attribute map_attribute_attr  = __ATTR_RO(attribute);
> +
> +/*
> + * These are default attributes that are added for every memmap entry.
> + */
> +static struct attribute *def_attrs[] = {
> +     &map_type_attr.attr,
> +     &map_phys_addr_attr.attr,
> +     &map_virt_addr_attr.attr,
> +     &map_num_pages_attr.attr,
> +     &map_attribute_attr.attr,
> +     NULL
> +};
> +
> +static const struct sysfs_ops map_attr_ops = {
> +     .show = map_attr_show,
> +};
> +
> +static inline struct efi_runtime_map_entry *to_map_entry(struct kobject 
> *kobj)
> +{
> +     return container_of(kobj, struct efi_runtime_map_entry, kobj);
> +}
> +
> +static void map_release(struct kobject *kobj)
> +{
> +     struct efi_runtime_map_entry *entry;
> +
> +     entry = to_map_entry(kobj);
> +     kfree(entry);
> +}
> +
> +static struct kobj_type __refdata map_ktype = {
> +     .sysfs_ops      = &map_attr_ops,
> +     .default_attrs  = def_attrs,
> +     .release        = map_release,
> +};
> +
> +static ssize_t type_show(struct efi_runtime_map_entry *entry, char *buf)
> +{
> +     return snprintf(buf, PAGE_SIZE, "0x%x\n", entry->md.type);
> +}
> +
> +static ssize_t phys_addr_show(struct efi_runtime_map_entry *entry, char *buf)
> +{
> +     return snprintf(buf, PAGE_SIZE, "0x%llx\n", entry->md.phys_addr);
> +}
> +
> +static ssize_t virt_addr_show(struct efi_runtime_map_entry *entry, char *buf)
> +{
> +     return snprintf(buf, PAGE_SIZE, "0x%llx\n", entry->md.virt_addr);
> +}
> +
> +static ssize_t num_pages_show(struct efi_runtime_map_entry *entry, char *buf)
> +{
> +     return snprintf(buf, PAGE_SIZE, "0x%llx\n", entry->md.num_pages);
> +}
> +
> +static ssize_t attribute_show(struct efi_runtime_map_entry *entry, char *buf)
> +{
> +     return snprintf(buf, PAGE_SIZE, "0x%llx\n", entry->md.attribute);
> +}

I can see the macro here writing itself too :)

> +
> +static inline struct map_attribute *to_map_attr(struct attribute *attr)
> +{
> +     return container_of(attr, struct map_attribute, attr);
> +}
> +
> +static ssize_t map_attr_show(struct kobject *kobj,
> +                             struct attribute *attr, char *buf)
> +{
> +     struct efi_runtime_map_entry *entry = to_map_entry(kobj);
> +     struct map_attribute *map_attr = to_map_attr(attr);
> +
> +     return map_attr->show(entry, buf);
> +}
> +
> +static struct kset *map_kset;
> +
> +/*
> + * Add map entry on sysfs
> + */

The function name should be enough - no need for the comment.

> +static struct efi_runtime_map_entry *add_sysfs_runtime_map_entry(int nr)
> +{
> +     int ret;
> +     unsigned long desc_size;
> +     struct efi_runtime_map_entry *entry;
> +     struct efi_info *e = &boot_params.efi_info;
> +
> +     if (!map_kset) {
> +             map_kset = kset_create_and_add("runtime-map", NULL,
> +                             efi_kobj);
> +             if (!map_kset)
> +                     return ERR_PTR(-ENOMEM);
> +     }
> +
> +     entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> +     if (!entry) {
> +             kset_unregister(map_kset);
> +             return entry;
> +     }
> +
> +     desc_size = e->efi_memdesc_size;
> +     memcpy(&entry->md, efi_runtime_map + nr * desc_size,
> +             sizeof(efi_memory_desc_t));

If you're going to break this line anyway, you don't really need the
local desc_size variable:

        memcpy(&entry->md,
               efi_runtime_map + nr * e->efi_memdesc_size,
               sizeof(efi_memory_desc_t));

> +
> +     kobject_init(&entry->kobj, &map_ktype);
> +     entry->kobj.kset = map_kset;
> +     ret = kobject_add(&entry->kobj, NULL, "%d", nr);
> +     if (ret) {
> +             kobject_put(&entry->kobj);
> +             kset_unregister(map_kset);
> +             return ERR_PTR(ret);
> +     }
> +
> +     return entry;
> +}
> +
> +static int __init efi_runtime_map_init(void)
> +{
> +     int i, j, ret = 0;
> +     struct efi_runtime_map_entry *entry;
> +
> +     if (!efi_runtime_map) {
> +             pr_warn("no efi_runtime_map found\n");
> +             return -EINVAL;
> +     }
> +
> +     map_entries = kzalloc(nr_efi_runtime_map *
> +             sizeof(struct efi_runtime_map_entry *), GFP_KERNEL);

Shorten:

        map_entries = kzalloc(nr_efi_runtime_map * sizeof(entry), GFP_KERNEL);

> +     if (!map_entries)
> +             return -ENOMEM;
> +
> +     for (i = 0; i < nr_efi_runtime_map; i++) {
> +             entry = add_sysfs_runtime_map_entry(i);
> +             if (IS_ERR(entry)) {
> +                     for (j = i - 1; j > 0; j--) {
> +                             entry = *(map_entries + j);
> +                             kobject_put(&entry->kobj);
> +                     }
> +                     if (map_kset)
> +                             kset_unregister(map_kset);
> +                     ret = PTR_ERR(entry);
> +                     goto out;
> +             }

You can move this error unwinding path away from the main loop:

        if (IS_ERR(entry))
                goto out;

and make it more readable.

> +             *(map_entries + i) = entry;
> +     }
> +
> +out:
> +     return ret;
> +}
> +late_initcall(efi_runtime_map_init);

Why an initcall? Can't we run this from efisubsys_init()?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to