On Thu, Jan 30, 2014 at 1:18 PM, Filipe David Manana <fdman...@gmail.com> wrote: > On Thu, Jan 30, 2014 at 3:11 AM, Darrick J. Wong > <darrick.w...@oracle.com> wrote: >> On Sun, Jan 12, 2014 at 02:22:46AM +0000, Filipe David Borba Manana wrote: >>> After the change titled "Btrfs: add support for inode properties", if >>> btrfs was built-in the kernel (i.e. not as a module), it would cause a >>> kernel panic, as reported recently by Fengguang: >>> >>> [ 2.024722] BUG: unable to handle kernel NULL pointer dereference at >>> (null) >>> [ 2.027814] IP: [<ffffffff81501594>] crc32c+0xc/0x6b >>> [ 2.028684] PGD 0 >>> [ 2.028684] Oops: 0000 [#1] SMP >>> [ 2.028684] Modules linked in: >>> [ 2.028684] CPU: 0 PID: 1 Comm: swapper/0 Not tainted >>> 3.13.0-rc7-04795-ga7b57c2 #1 >>> [ 2.028684] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 >>> [ 2.028684] task: ffff88000edba100 ti: ffff88000edd6000 task.ti: >>> ffff88000edd6000 >>> [ 2.028684] RIP: 0010:[<ffffffff81501594>] [<ffffffff81501594>] >>> crc32c+0xc/0x6b >>> [ 2.028684] RSP: 0000:ffff88000edd7e58 EFLAGS: 00010246 >>> [ 2.028684] RAX: 0000000000000000 RBX: ffffffff82295550 RCX: >>> 0000000000000000 >>> [ 2.028684] RDX: 0000000000000011 RSI: ffffffff81efe393 RDI: >>> 00000000fffffffe >>> [ 2.028684] RBP: ffff88000edd7e60 R08: 0000000000000003 R09: >>> 0000000000015d20 >>> [ 2.028684] R10: ffffffff81ef225e R11: ffffffff811b0222 R12: >>> ffffffffffffffff >>> [ 2.028684] R13: 0000000000000239 R14: 0000000000000000 R15: >>> 0000000000000000 >>> [ 2.028684] FS: 0000000000000000(0000) GS:ffff88000fa00000(0000) >>> knlGS:0000000000000000 >>> [ 2.028684] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>> [ 2.028684] CR2: 0000000000000000 CR3: 000000000220c000 CR4: >>> 00000000000006f0 >>> [ 2.028684] Stack: >>> [ 2.028684] ffffffff82295550 ffff88000edd7e80 ffffffff8238af62 >>> ffffffff8238ac05 >>> [ 2.028684] 0000000000000000 ffff88000edd7e98 ffffffff8238ac0f >>> ffffffff8238ac05 >>> [ 2.028684] ffff88000edd7f08 ffffffff810002ba ffff88000edd7f00 >>> ffffffff810e2404 >>> [ 2.028684] Call Trace: >>> [ 2.028684] [<ffffffff8238af62>] btrfs_props_init+0x4f/0x96 >>> [ 2.028684] [<ffffffff8238ac05>] ? >>> ftrace_define_fields_btrfs_space_reservation+0x145/0x145 >>> [ 2.028684] [<ffffffff8238ac0f>] init_btrfs_fs+0xa/0xf0 >>> [ 2.028684] [<ffffffff8238ac05>] ? >>> ftrace_define_fields_btrfs_space_reservation+0x145/0x145 >>> [ 2.028684] [<ffffffff810002ba>] do_one_initcall+0xa4/0x13a >>> [ 2.028684] [<ffffffff810e2404>] ? parse_args+0x25f/0x33d >>> [ 2.028684] [<ffffffff8234cf75>] kernel_init_freeable+0x1aa/0x230 >>> [ 2.028684] [<ffffffff8234c785>] ? do_early_param+0x88/0x88 >>> [ 2.028684] [<ffffffff819f61b5>] ? rest_init+0x89/0x89 >>> [ 2.028684] [<ffffffff819f61c3>] kernel_init+0xe/0x109 >>> >>> The issue here is that the initialization function of btrfs >>> (super.c:init_btrfs_fs) >>> started using crc32c (from lib/libcrc32c.c). But when it needs to call >>> crc32c (as >>> part of the properties initialization routine), the libcrc32c is not yet >>> initialized, >>> so crc32c derreferenced a NULL pointer (lib/libcrc32c.c:tfm), causing the >>> kernel >>> panic on boot. >>> >>> The approach to fix this is to use crypto component directly to use its >>> crc32c (which >>> is basically what lib/libcrc32c.c is, a wrapper around crypto). This is >>> what ext4 is >>> doing as well, it uses crypto directly to get crc32c functionality. >> >> Sorry, I didn't even see this until your other patch today... >> >> Yes, ext4 bypasses libcrc32c, but notice how ext4/jbd2 only call >> crypto_alloc_shash() when you mount the filesystem. This silly trick enables >> ext4 to take advantage of faster crc32c implementations that don't get loaded >> until after the initial insmod, just in case you do this: >> >> # modprobe btrfs >> # mount /dev/X / <-- / uses software crc32c implementation >> # modprobe crc32c-intel >> # mount /dev/Y /home <-- /home still uses software crc32c, because >> *tfm still points to the sw >> implementation! >> >> Of course, one could argue that this is a distro problem, and that the >> initramfs or kernel config should take care to load the appropriate hardware >> accelerated drivers /before/ btrfs. >> >> On the other hand, btrfs might as well always enjoy the fastest >> implementation >> that it can get its hands on. > > Thanks Darrick, that's very useful information. > > So, if I understood it correctly, before this change btrfs was using > libcrc32c, which is initialized only once and therefore it's possible > it wouldn't use crc32c-intel - if crc32c-intel was built as a module > and not loaded before libcrc32c is initialized. > > Also that ext4 approach, of using crypto_alloc_shash() on each mount > (and make the result accessible via super block structure) doesn't > solve the case where the partition is mounted before crc32c-intel is > loaded but then it's never unmounted and mounted again (like a root or > home partition). By solve I mean not profiting from intel's hardware > based crc32c. > > I suppose that if crc32c-intel is built-in (i.e. not as a module) that > either with the current approach (after this patch) or with the old > approach, we'll be using crc32c-intel. Is there any way to verify > this?
Just answered my own last question by printing tfm->base.__crt_alg->cra_driver_name in fs/btrfs/hash.c:btrfs_hash_init(), which printed "crc32c-intel". > >> >> (Also I suppose metadata_csum is optional in ext4/jbd2...) >> >> --D >> >>> >>> Verified this works both when btrfs is built-in and when it's loadable >>> kernel module. >>> >>> Signed-off-by: Filipe David Borba Manana <fdman...@gmail.com> >>> --- >>> >>> V2: Removed unnecessary __exit qualifier. >>> >>> fs/btrfs/Kconfig | 3 ++- >>> fs/btrfs/Makefile | 2 +- >>> fs/btrfs/extent-tree.c | 6 +++--- >>> fs/btrfs/hash.c | 49 >>> ++++++++++++++++++++++++++++++++++++++++++++++++ >>> fs/btrfs/hash.h | 11 ++++++++--- >>> fs/btrfs/super.c | 10 +++++++++- >>> 6 files changed, 72 insertions(+), 9 deletions(-) >>> create mode 100644 fs/btrfs/hash.c >>> >>> diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig >>> index aa976ec..a66768e 100644 >>> --- a/fs/btrfs/Kconfig >>> +++ b/fs/btrfs/Kconfig >>> @@ -1,6 +1,7 @@ >>> config BTRFS_FS >>> tristate "Btrfs filesystem support" >>> - select LIBCRC32C >>> + select CRYPTO >>> + select CRYPTO_CRC32C >>> select ZLIB_INFLATE >>> select ZLIB_DEFLATE >>> select LZO_COMPRESS >>> diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile >>> index af7f000..f341a98 100644 >>> --- a/fs/btrfs/Makefile >>> +++ b/fs/btrfs/Makefile >>> @@ -9,7 +9,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o >>> root-tree.o dir-item.o \ >>> export.o tree-log.o free-space-cache.o zlib.o lzo.o \ >>> compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \ >>> reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \ >>> - uuid-tree.o props.o >>> + uuid-tree.o props.o hash.o >>> >>> btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o >>> btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o >>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c >>> index 77acc08..db19ed7 100644 >>> --- a/fs/btrfs/extent-tree.c >>> +++ b/fs/btrfs/extent-tree.c >>> @@ -1072,11 +1072,11 @@ static u64 hash_extent_data_ref(u64 root_objectid, >>> u64 owner, u64 offset) >>> __le64 lenum; >>> >>> lenum = cpu_to_le64(root_objectid); >>> - high_crc = crc32c(high_crc, &lenum, sizeof(lenum)); >>> + high_crc = btrfs_crc32c(high_crc, &lenum, sizeof(lenum)); >>> lenum = cpu_to_le64(owner); >>> - low_crc = crc32c(low_crc, &lenum, sizeof(lenum)); >>> + low_crc = btrfs_crc32c(low_crc, &lenum, sizeof(lenum)); >>> lenum = cpu_to_le64(offset); >>> - low_crc = crc32c(low_crc, &lenum, sizeof(lenum)); >>> + low_crc = btrfs_crc32c(low_crc, &lenum, sizeof(lenum)); >>> >>> return ((u64)high_crc << 31) ^ (u64)low_crc; >>> } >>> diff --git a/fs/btrfs/hash.c b/fs/btrfs/hash.c >>> new file mode 100644 >>> index 0000000..ab3a938 >>> --- /dev/null >>> +++ b/fs/btrfs/hash.c >>> @@ -0,0 +1,49 @@ >>> +/* >>> + * Copyright (C) 2014 Filipe David Borba Manana <fdman...@gmail.com> >>> + * >>> + * This program is free software; you can redistribute it and/or >>> + * modify it under the terms of the GNU General Public >>> + * License v2 as published by the Free Software Foundation. >>> + * >>> + * This program is distributed in the hope that it will be useful, >>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >>> + * General Public License for more details. >>> + */ >>> + >>> +#include <crypto/hash.h> >>> +#include "hash.h" >>> + >>> +static struct crypto_shash *tfm; >>> + >>> +int __init btrfs_hash_init(void) >>> +{ >>> + tfm = crypto_alloc_shash("crc32c", 0, 0); >>> + if (IS_ERR(tfm)) >>> + return PTR_ERR(tfm); >>> + >>> + return 0; >>> +} >>> + >>> +void btrfs_hash_exit(void) >>> +{ >>> + crypto_free_shash(tfm); >>> +} >>> + >>> +u32 btrfs_crc32c(u32 crc, const void *address, unsigned int length) >>> +{ >>> + struct { >>> + struct shash_desc shash; >>> + char ctx[crypto_shash_descsize(tfm)]; >>> + } desc; >>> + int err; >>> + >>> + desc.shash.tfm = tfm; >>> + desc.shash.flags = 0; >>> + *(u32 *)desc.ctx = crc; >>> + >>> + err = crypto_shash_update(&desc.shash, address, length); >>> + BUG_ON(err); >>> + >>> + return *(u32 *)desc.ctx; >>> +} >>> diff --git a/fs/btrfs/hash.h b/fs/btrfs/hash.h >>> index 1d98281..118a231 100644 >>> --- a/fs/btrfs/hash.h >>> +++ b/fs/btrfs/hash.h >>> @@ -19,10 +19,15 @@ >>> #ifndef __HASH__ >>> #define __HASH__ >>> >>> -#include <linux/crc32c.h> >>> +int __init btrfs_hash_init(void); >>> + >>> +void btrfs_hash_exit(void); >>> + >>> +u32 btrfs_crc32c(u32 crc, const void *address, unsigned int length); >>> + >>> static inline u64 btrfs_name_hash(const char *name, int len) >>> { >>> - return crc32c((u32)~1, name, len); >>> + return btrfs_crc32c((u32)~1, name, len); >>> } >>> >>> /* >>> @@ -31,7 +36,7 @@ static inline u64 btrfs_name_hash(const char *name, int >>> len) >>> static inline u64 btrfs_extref_hash(u64 parent_objectid, const char *name, >>> int len) >>> { >>> - return (u64) crc32c(parent_objectid, name, len); >>> + return (u64) btrfs_crc32c(parent_objectid, name, len); >>> } >>> >>> #endif >>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c >>> index 461e41c..f44cc6a 100644 >>> --- a/fs/btrfs/super.c >>> +++ b/fs/btrfs/super.c >>> @@ -48,6 +48,7 @@ >>> #include "transaction.h" >>> #include "btrfs_inode.h" >>> #include "print-tree.h" >>> +#include "hash.h" >>> #include "props.h" >>> #include "xattr.h" >>> #include "volumes.h" >>> @@ -1866,11 +1867,15 @@ static int __init init_btrfs_fs(void) >>> { >>> int err; >>> >>> + err = btrfs_hash_init(); >>> + if (err) >>> + return err; >>> + >>> btrfs_props_init(); >>> >>> err = btrfs_init_sysfs(); >>> if (err) >>> - return err; >>> + goto free_hash; >>> >>> btrfs_init_compress(); >>> >>> @@ -1945,6 +1950,8 @@ free_cachep: >>> free_compress: >>> btrfs_exit_compress(); >>> btrfs_exit_sysfs(); >>> +free_hash: >>> + btrfs_hash_exit(); >>> return err; >>> } >>> >>> @@ -1963,6 +1970,7 @@ static void __exit exit_btrfs_fs(void) >>> btrfs_exit_sysfs(); >>> btrfs_cleanup_fs_uuids(); >>> btrfs_exit_compress(); >>> + btrfs_hash_exit(); >>> } >>> >>> module_init(init_btrfs_fs) >>> -- >>> 1.7.9.5 >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > Filipe David Manana, > > "Reasonable men adapt themselves to the world. > Unreasonable men adapt the world to themselves. > That's why all progress depends on unreasonable men." -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html