4.9 (and earlier) LTS kernels are missing this:

commit ec00022030da5761518476096626338bd67df57a
Author: Tahsin Erdogan <tah...@google.com>
Date:   Sat Aug 5 22:41:42 2017 -0400

    ext4: inplace xattr block update fails to deduplicate blocks

OK to backport it?
I tested it briefly in 4.9, seems to work.

One of our testers noticed a glusterfs performance regression when going from 4.4 to 4.9, caused by the duplicated blocks.

In I understand everything correctly, in 4.4 mbcache uses the block number in the hash table bucket calculation, and the hash table is populated quite evenly even if there are duplicates. So the mbcache is fast.

But in later kernels mbcache puts all the duplicate entries into a single bucket. As the entries are stored in one big linked list, this obviously makes the mbcache slow.

I tested this in 4.9 (which still has the ext4_xattr_rehash() call that got eliminated in commit "ext4: eliminate xattr entry e_hash recalculation for removes"):

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 3eeed8f0aa06..3fadfabcac39 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -837,8 +837,6 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
                                if (!IS_LAST_ENTRY(s->first))
-                               ext4_xattr_cache_insert(ext4_mb_cache,
-                                       bs->bh);
                        ext4_xattr_block_csum_set(inode, bs->bh);
@@ -959,6 +957,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
                } else if (bs->bh && s->base == bs->bh->b_data) {
                        /* We were modifying this block in-place. */
                        ea_bdebug(bs->bh, "keeping this block");
+                       ext4_xattr_cache_insert(ext4_mb_cache, bs->bh);
                        new_bh = bs->bh;
                } else {


