Re: [PATCH] x86/kexec: Add EFI config table identity mapping for kexec kernel
Add Ard to CC. On 05/25/23 at 05:49pm, Tao Liu wrote: > A kexec kernel bootup hang is observed on Intel Atom cpu due to unmapped > EFI config table. > > Currently EFI system table is identity-mapped for the kexec kernel, but EFI > config table is not mapped explicitly: > > commit 6bbeb276b71f ("x86/kexec: Add the EFI system tables and ACPI > tables to the ident map") > > Later in the following 2 commits, EFI config table will be accessed when > enabling sev at kernel startup. This may result in a page fault due to EFI > config table's unmapped address. Since the page fault occurs at an early > stage, it is unrecoverable and kernel hangs. > > commit ec1c66af3a30 ("x86/compressed/64: Detect/setup SEV/SME features > earlier during boot") > commit c01fce9cef84 ("x86/compressed: Add SEV-SNP feature > detection/setup") > > In addition, the issue doesn't appear on all systems, because the kexec > kernel uses Page Size Extension (PSE) for identity mapping. In most cases, > EFI config table can end up to be mapped into due to 1 GB page size. > However if nogbpages is set, or cpu doesn't support pdpe1gb feature > (e.g Intel Atom x6425RE cpu), EFI config table may not be mapped into > due to 2 MB page size, thus a page fault hang is more likely to happen. > > In this patch, we will make sure the EFI config table is always mapped. > > Signed-off-by: Tao Liu > --- > arch/x86/kernel/machine_kexec_64.c | 35 ++ > 1 file changed, 31 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kernel/machine_kexec_64.c > b/arch/x86/kernel/machine_kexec_64.c > index 1a3e2c05a8a5..755aa12f583f 100644 > --- a/arch/x86/kernel/machine_kexec_64.c > +++ b/arch/x86/kernel/machine_kexec_64.c > @@ -28,6 +28,7 @@ > #include > #include > #include > +#include > > #ifdef CONFIG_ACPI > /* > @@ -86,10 +87,12 @@ const struct kexec_file_ops * const kexec_file_loaders[] > = { > #endif > > static int > -map_efi_systab(struct x86_mapping_info *info, pgd_t *level4p) > +map_efi_sys_cfg_tab(struct x86_mapping_info *info, pgd_t *level4p) > { > #ifdef CONFIG_EFI > unsigned long mstart, mend; > + void *kaddr; > + int ret; > > if (!efi_enabled(EFI_BOOT)) > return 0; > @@ -105,6 +108,30 @@ map_efi_systab(struct x86_mapping_info *info, pgd_t > *level4p) > if (!mstart) > return 0; > > + ret = kernel_ident_mapping_init(info, level4p, mstart, mend); > + if (ret) > + return ret; > + > + kaddr = memremap(mstart, mend - mstart, MEMREMAP_WB); > + if (!kaddr) { > + pr_err("Could not map UEFI system table\n"); > + return -ENOMEM; > + } > + > + mstart = efi_config_table; > + > + if (efi_enabled(EFI_64BIT)) { > + efi_system_table_64_t *stbl = (efi_system_table_64_t *)kaddr; > + > + mend = mstart + sizeof(efi_config_table_64_t) * stbl->nr_tables; > + } else { > + efi_system_table_32_t *stbl = (efi_system_table_32_t *)kaddr; > + > + mend = mstart + sizeof(efi_config_table_32_t) * stbl->nr_tables; > + } > + > + memunmap(kaddr); > + > return kernel_ident_mapping_init(info, level4p, mstart, mend); > #endif > return 0; > @@ -244,10 +271,10 @@ static int init_pgtable(struct kimage *image, unsigned > long start_pgtable) > } > > /* > - * Prepare EFI systab and ACPI tables for kexec kernel since they are > - * not covered by pfn_mapped. > + * Prepare EFI systab, config table and ACPI tables for kexec kernel > + * since they are not covered by pfn_mapped. >*/ > - result = map_efi_systab(, level4p); > + result = map_efi_sys_cfg_tab(, level4p); > if (result) > return result; > > -- > 2.33.1 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] x86/kexec: Add EFI config table identity mapping for kexec kernel
Hi Tao, On 05/25/23 at 05:49pm, Tao Liu wrote: > A kexec kernel bootup hang is observed on Intel Atom cpu due to unmapped > EFI config table. > > Currently EFI system table is identity-mapped for the kexec kernel, but EFI > config table is not mapped explicitly: > > commit 6bbeb276b71f ("x86/kexec: Add the EFI system tables and ACPI > tables to the ident map") > > Later in the following 2 commits, EFI config table will be accessed when > enabling sev at kernel startup. This may result in a page fault due to EFI > config table's unmapped address. Since the page fault occurs at an early > stage, it is unrecoverable and kernel hangs. > > commit ec1c66af3a30 ("x86/compressed/64: Detect/setup SEV/SME features > earlier during boot") > commit c01fce9cef84 ("x86/compressed: Add SEV-SNP feature > detection/setup") > > In addition, the issue doesn't appear on all systems, because the kexec > kernel uses Page Size Extension (PSE) for identity mapping. In most cases, > EFI config table can end up to be mapped into due to 1 GB page size. > However if nogbpages is set, or cpu doesn't support pdpe1gb feature > (e.g Intel Atom x6425RE cpu), EFI config table may not be mapped into > due to 2 MB page size, thus a page fault hang is more likely to happen. > > In this patch, we will make sure the EFI config table is always mapped. Nice work. While you may need to rephrase above sentence, x86 maintainers don't like log with the 'this patch,' or 'we'. Please refer to 'Changelog' part of Documentation/process/maintainer-tip.rst and improve it. > > Signed-off-by: Tao Liu > --- > arch/x86/kernel/machine_kexec_64.c | 35 ++ > 1 file changed, 31 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kernel/machine_kexec_64.c > b/arch/x86/kernel/machine_kexec_64.c > index 1a3e2c05a8a5..755aa12f583f 100644 > --- a/arch/x86/kernel/machine_kexec_64.c > +++ b/arch/x86/kernel/machine_kexec_64.c > @@ -28,6 +28,7 @@ > #include > #include > #include > +#include > > #ifdef CONFIG_ACPI > /* > @@ -86,10 +87,12 @@ const struct kexec_file_ops * const kexec_file_loaders[] > = { > #endif > > static int > -map_efi_systab(struct x86_mapping_info *info, pgd_t *level4p) > +map_efi_sys_cfg_tab(struct x86_mapping_info *info, pgd_t *level4p) Can we call the function map_efi_tables() since we will map efi system table, system config table. If you need add another table mapping here, what would you call it, map_efi_sys_cfg_xxx_tab()? Anyway, not very strong opinion as long as x86 maintainer likes it. > { > #ifdef CONFIG_EFI > unsigned long mstart, mend; > + void *kaddr; > + int ret; > > if (!efi_enabled(EFI_BOOT)) > return 0; > @@ -105,6 +108,30 @@ map_efi_systab(struct x86_mapping_info *info, pgd_t > *level4p) > if (!mstart) > return 0; > > + ret = kernel_ident_mapping_init(info, level4p, mstart, mend); > + if (ret) > + return ret; > + > + kaddr = memremap(mstart, mend - mstart, MEMREMAP_WB); > + if (!kaddr) { > + pr_err("Could not map UEFI system table\n"); > + return -ENOMEM; > + } > + > + mstart = efi_config_table; > + > + if (efi_enabled(EFI_64BIT)) { > + efi_system_table_64_t *stbl = (efi_system_table_64_t *)kaddr; > + > + mend = mstart + sizeof(efi_config_table_64_t) * stbl->nr_tables; > + } else { > + efi_system_table_32_t *stbl = (efi_system_table_32_t *)kaddr; > + > + mend = mstart + sizeof(efi_config_table_32_t) * stbl->nr_tables; > + } > + > + memunmap(kaddr); > + > return kernel_ident_mapping_init(info, level4p, mstart, mend); > #endif > return 0; > @@ -244,10 +271,10 @@ static int init_pgtable(struct kimage *image, unsigned > long start_pgtable) > } > > /* > - * Prepare EFI systab and ACPI tables for kexec kernel since they are > - * not covered by pfn_mapped. > + * Prepare EFI systab, config table and ACPI tables for kexec kernel > + * since they are not covered by pfn_mapped. >*/ > - result = map_efi_systab(, level4p); > + result = map_efi_sys_cfg_tab(, level4p); > if (result) > return result; > > -- > 2.33.1 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/3] ext4: replace kthread freezing with auto fs freezing
On Sun 07-05-23 18:19:25, Luis Chamberlain wrote: > The kernel power management now supports allowing the VFS > to handle filesystem freezing freezes and thawing. Take advantage > of that and remove the kthread freezing. This is needed so that we > properly really stop IO in flight without races after userspace > has been frozen. Without this we rely on kthread freezing and > its semantics are loose and error prone. > > The filesystem therefore is in charge of properly dealing with > quiescing of the filesystem through its callbacks if it thinks > it knows better than how the VFS handles it. > > The following Coccinelle rule was used as to remove the now superfluous > freezer calls: > > make coccicheck MODE=patch SPFLAGS="--in-place --no-show-diff" > COCCI=./fs-freeze-cleanup.cocci M=fs/ext4 > > virtual patch > > @ remove_set_freezable @ > expression time; > statement S, S2; > expression task, current; > @@ > > ( > - set_freezable(); > | > - if (try_to_freeze()) > - continue; > | > - try_to_freeze(); > | > - freezable_schedule(); > + schedule(); > | > - freezable_schedule_timeout(time); > + schedule_timeout(time); > | > - if (freezing(task)) { S } > | > - if (freezing(task)) { S } > - else > { S2 } > | > - freezing(current) > ) > > @ remove_wq_freezable @ > expression WQ_E, WQ_ARG1, WQ_ARG2, WQ_ARG3, WQ_ARG4; > identifier fs_wq_fn; > @@ > > ( > WQ_E = alloc_workqueue(WQ_ARG1, > - WQ_ARG2 | WQ_FREEZABLE, > + WQ_ARG2, > ...); > | > WQ_E = alloc_workqueue(WQ_ARG1, > - WQ_ARG2 | WQ_FREEZABLE | WQ_ARG3, > + WQ_ARG2 | WQ_ARG3, > ...); > | > WQ_E = alloc_workqueue(WQ_ARG1, > - WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE, > + WQ_ARG2 | WQ_ARG3, > ...); > | > WQ_E = alloc_workqueue(WQ_ARG1, > - WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE | WQ_ARG4, > + WQ_ARG2 | WQ_ARG3 | WQ_ARG4, > ...); > | > WQ_E = > - WQ_ARG1 | WQ_FREEZABLE > + WQ_ARG1 > | > WQ_E = > - WQ_ARG1 | WQ_FREEZABLE | WQ_ARG3 > + WQ_ARG1 | WQ_ARG3 > | > fs_wq_fn( > - WQ_FREEZABLE | WQ_ARG2 | WQ_ARG3 > + WQ_ARG2 | WQ_ARG3 > ) > | > fs_wq_fn( > - WQ_FREEZABLE | WQ_ARG2 > + WQ_ARG2 > ) > | > fs_wq_fn( > - WQ_FREEZABLE > + 0 > ) > ) > > @ add_auto_flag @ > expression E1; > identifier fs_type; > @@ > > struct file_system_type fs_type = { > .fs_flags = E1 > + | FS_AUTOFREEZE > , > }; > > Generated-by: Coccinelle SmPL > Signed-off-by: Luis Chamberlain I guess we can also usually remove the #include line? At least in ext4 it is the case I believe. Otherwise this looks good. Honza > --- > fs/ext4/super.c | 9 +++-- > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index d39f386e9baf..1f436938d8be 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -136,7 +136,7 @@ static struct file_system_type ext2_fs_type = { > .init_fs_context= ext4_init_fs_context, > .parameters = ext4_param_specs, > .kill_sb= kill_block_super, > - .fs_flags = FS_REQUIRES_DEV, > + .fs_flags = FS_REQUIRES_DEV | FS_AUTOFREEZE, > }; > MODULE_ALIAS_FS("ext2"); > MODULE_ALIAS("ext2"); > @@ -152,7 +152,7 @@ static struct file_system_type ext3_fs_type = { > .init_fs_context= ext4_init_fs_context, > .parameters = ext4_param_specs, > .kill_sb= kill_block_super, > - .fs_flags = FS_REQUIRES_DEV, > + .fs_flags = FS_REQUIRES_DEV | FS_AUTOFREEZE, > }; > MODULE_ALIAS_FS("ext3"); > MODULE_ALIAS("ext3"); > @@ -3790,7 +3790,6 @@ static int ext4_lazyinit_thread(void *arg) > unsigned long next_wakeup, cur; > > BUG_ON(NULL == eli); > - set_freezable(); > > cont_thread: > while (true) { > @@ -3842,8 +3841,6 @@ static int ext4_lazyinit_thread(void *arg) > } > mutex_unlock(>li_list_mtx); > > - try_to_freeze(); > - > cur = jiffies; > if ((time_after_eq(cur, next_wakeup)) || > (MAX_JIFFY_OFFSET == next_wakeup)) { > @@ -7245,7 +7242,7 @@ static struct file_system_type ext4_fs_type = { > .init_fs_context= ext4_init_fs_context, > .parameters = ext4_param_specs, > .kill_sb= kill_block_super, > -
[PATCH] kexec: Avoid calculating array size twice
Avoid calculating array size twice in kexec_purgatory_setup_sechdrs(). Once using array_size(), and once open-coded. Flagged by Coccinelle: .../kexec_file.c:881:8-25: WARNING: array_size is already used (line 877) to compute the same size No functional change intended. Compile tested only. Signed-off-by: Simon Horman --- kernel/kexec_file.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index f989f5f1933b..3f5677679744 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -867,6 +867,7 @@ static int kexec_purgatory_setup_sechdrs(struct purgatory_info *pi, { unsigned long bss_addr; unsigned long offset; + size_t sechdrs_size; Elf_Shdr *sechdrs; int i; @@ -874,11 +875,11 @@ static int kexec_purgatory_setup_sechdrs(struct purgatory_info *pi, * The section headers in kexec_purgatory are read-only. In order to * have them modifiable make a temporary copy. */ - sechdrs = vzalloc(array_size(sizeof(Elf_Shdr), pi->ehdr->e_shnum)); + sechdrs_size = array_size(sizeof(Elf_Shdr), pi->ehdr->e_shnum); + sechdrs = vzalloc(sechdrs_size); if (!sechdrs) return -ENOMEM; - memcpy(sechdrs, (void *)pi->ehdr + pi->ehdr->e_shoff, - pi->ehdr->e_shnum * sizeof(Elf_Shdr)); + memcpy(sechdrs, (void *)pi->ehdr + pi->ehdr->e_shoff, sechdrs_size); pi->sechdrs = sechdrs; offset = 0; ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 2/6] fs: add frozen sb state helpers
On Sun 07-05-23 18:17:13, Luis Chamberlain wrote: > Provide helpers so that we can check a superblock frozen state. > This will make subsequent changes easier to read. This makes > no functional changes. > > Reviewed-by: Jan Kara > Signed-off-by: Luis Chamberlain Just noticed one nit... > diff --git a/fs/super.c b/fs/super.c > index 0e9d48846684..46c6475fc765 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -905,7 +905,7 @@ int reconfigure_super(struct fs_context *fc) > > if (fc->sb_flags_mask & ~MS_RMT_MASK) > return -EINVAL; > - if (sb->s_writers.frozen != SB_UNFROZEN) > + if (!(sb_is_unfrozen(sb))) ^ unneeded parenthesis here Honza -- Jan Kara SUSE Labs, CR ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/6] fs: unify locking semantics for fs freeze / thaw
On Sun 07-05-23 18:17:12, Luis Chamberlain wrote: > Right now freeze_super() and thaw_super() are called with > different locking contexts. To expand on this is messy, so > just unify the requirement to require grabbing an active > reference and keep the superblock locked. > > Suggested-by: Christoph Hellwig > Signed-off-by: Luis Chamberlain Finally got around to looking at this. Sorry for the delay. In principle I like the direction but see below: > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c > index 61c5f9d26018..e31d6791d3e3 100644 > --- a/fs/f2fs/gc.c > +++ b/fs/f2fs/gc.c > @@ -2166,7 +2166,10 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 > block_count) > if (err) > return err; > > + if (!get_active_super(sbi->sb->s_bdev)) > + return -ENOTTY; Calling get_active_super() like this is just sick. You rather want to provide a helper for grabbing another active sb reference and locking the sb when you already have sb reference. Because that is what is needed in the vast majority of the places. Something like void grab_active_super(struct super_block *sb) { down_write(sb->s_umount); atomic_inc(>s_active); } > @@ -851,13 +849,13 @@ struct super_block *get_active_super(struct > block_device *bdev) > if (sb->s_bdev == bdev) { > if (!grab_super(sb)) > goto restart; > - up_write(>s_umount); > return sb; > } > } > spin_unlock(_lock); > return NULL; > } > +EXPORT_SYMBOL_GPL(get_active_super); And I'd call this grab_bdev_super() and no need to export it when you have grab_active_super(). > @@ -1636,10 +1634,13 @@ static void sb_freeze_unlock(struct super_block *sb, > int level) > } > > /** > - * freeze_super - lock the filesystem and force it into a consistent state > + * freeze_super - force a filesystem backed by a block device into a > consistent state > * @sb: the super to lock > * > - * Syncs the super to make sure the filesystem is consistent and calls the > fs's > + * Used by filesystems and the kernel to freeze a fileystem backed by a block > + * device into a consistent state. Callers must use get_active_super(bdev) to > + * lock the @sb and when done must unlock it with deactivate_locked_super(). > + * Syncs the filesystem backed by the @sb and calls the filesystem's optional > * freeze_fs. Subsequent calls to this without first thawing the fs will > return > * -EBUSY. > * > @@ -1672,22 +1673,15 @@ int freeze_super(struct super_block *sb) > { > int ret; > > - atomic_inc(>s_active); > - down_write(>s_umount); > - if (sb->s_writers.frozen != SB_UNFROZEN) { > - deactivate_locked_super(sb); At least add a warning for s_umount not being held here? > + if (sb->s_writers.frozen != SB_UNFROZEN) > return -EBUSY; > - } > > - if (!(sb->s_flags & SB_BORN)) { > - up_write(>s_umount); > + if (!(sb->s_flags & SB_BORN)) > return 0; /* sic - it's "nothing to do" */ > - } > > if (sb_rdonly(sb)) { > /* Nothing to do really... */ > sb->s_writers.frozen = SB_FREEZE_COMPLETE; > - up_write(>s_umount); > return 0; > } Honza -- Jan Kara SUSE Labs, CR ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH] x86/kexec: Add EFI config table identity mapping for kexec kernel
A kexec kernel bootup hang is observed on Intel Atom cpu due to unmapped EFI config table. Currently EFI system table is identity-mapped for the kexec kernel, but EFI config table is not mapped explicitly: commit 6bbeb276b71f ("x86/kexec: Add the EFI system tables and ACPI tables to the ident map") Later in the following 2 commits, EFI config table will be accessed when enabling sev at kernel startup. This may result in a page fault due to EFI config table's unmapped address. Since the page fault occurs at an early stage, it is unrecoverable and kernel hangs. commit ec1c66af3a30 ("x86/compressed/64: Detect/setup SEV/SME features earlier during boot") commit c01fce9cef84 ("x86/compressed: Add SEV-SNP feature detection/setup") In addition, the issue doesn't appear on all systems, because the kexec kernel uses Page Size Extension (PSE) for identity mapping. In most cases, EFI config table can end up to be mapped into due to 1 GB page size. However if nogbpages is set, or cpu doesn't support pdpe1gb feature (e.g Intel Atom x6425RE cpu), EFI config table may not be mapped into due to 2 MB page size, thus a page fault hang is more likely to happen. In this patch, we will make sure the EFI config table is always mapped. Signed-off-by: Tao Liu --- arch/x86/kernel/machine_kexec_64.c | 35 ++ 1 file changed, 31 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index 1a3e2c05a8a5..755aa12f583f 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -28,6 +28,7 @@ #include #include #include +#include #ifdef CONFIG_ACPI /* @@ -86,10 +87,12 @@ const struct kexec_file_ops * const kexec_file_loaders[] = { #endif static int -map_efi_systab(struct x86_mapping_info *info, pgd_t *level4p) +map_efi_sys_cfg_tab(struct x86_mapping_info *info, pgd_t *level4p) { #ifdef CONFIG_EFI unsigned long mstart, mend; + void *kaddr; + int ret; if (!efi_enabled(EFI_BOOT)) return 0; @@ -105,6 +108,30 @@ map_efi_systab(struct x86_mapping_info *info, pgd_t *level4p) if (!mstart) return 0; + ret = kernel_ident_mapping_init(info, level4p, mstart, mend); + if (ret) + return ret; + + kaddr = memremap(mstart, mend - mstart, MEMREMAP_WB); + if (!kaddr) { + pr_err("Could not map UEFI system table\n"); + return -ENOMEM; + } + + mstart = efi_config_table; + + if (efi_enabled(EFI_64BIT)) { + efi_system_table_64_t *stbl = (efi_system_table_64_t *)kaddr; + + mend = mstart + sizeof(efi_config_table_64_t) * stbl->nr_tables; + } else { + efi_system_table_32_t *stbl = (efi_system_table_32_t *)kaddr; + + mend = mstart + sizeof(efi_config_table_32_t) * stbl->nr_tables; + } + + memunmap(kaddr); + return kernel_ident_mapping_init(info, level4p, mstart, mend); #endif return 0; @@ -244,10 +271,10 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable) } /* -* Prepare EFI systab and ACPI tables for kexec kernel since they are -* not covered by pfn_mapped. +* Prepare EFI systab, config table and ACPI tables for kexec kernel +* since they are not covered by pfn_mapped. */ - result = map_efi_systab(, level4p); + result = map_efi_sys_cfg_tab(, level4p); if (result) return result; -- 2.33.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Tlakově litý
Dobré ráno, zajišťujeme technologii tlakového lití hliníku. Máme výrobní závody v Polsku, Švédsku a Číně se schopností flexibilně přesouvat výrobu mezi lokalitami. Naše licí buňky jsou většinou automatické nebo poloautomatické, což umožňuje výrobu velkých výrobních sérií s vysokou flexibilitou detailů. Poskytujeme podporu v každé fázi vývoje projektu, vyvíjíme strukturu detailu. Chtěli byste mluvit o spolupráci v této oblasti? Pozdravy Kristián Pletánek ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec