** Changed in: linux (Ubuntu Bionic)
       Status: Triaged => Fix Committed

** Changed in: linux (Ubuntu Focal)
       Status: Triaged => Fix Committed

** Changed in: linux (Ubuntu Groovy)
       Status: Triaged => Fix Committed

** Changed in: linux (Ubuntu Hirsute)
       Status: Triaged => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1933074

Title:
  large_dir in ext4 broken

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed
Status in linux source package in Hirsute:
  Fix Committed
Status in linux source package in Impish:
  Triaged

Bug description:
  == SRU, Bionic, Focal, Groovy, Hirsute, Impish ==

  [Impact]

  Creating millions of files on ext4 partition with large_dir support by
  touching them will eventually trip an ext4 leaf node issue in the
  index hash. This occurs more frequently when also using smaller block
  sizes and ends up either with a EXIST or EUCLEAN failure.

  This occurs on the restart condition when performing do_split.

  [ Fix ]

  The fix protects do_split() from the restart condition, making it safe
  from both current and future ordering of goto statements in earlier
  sections of the code.

  The fix is from a patch sent upstream and cc'd to Ted Tso but didn't
  appear on the ext4 mailing list presumably because it got marked as
  SPAM.

  [ Test Case ]

  Without the fix touching tens of thousands of empty files will trip
  the issue. It seems to occur more frequently with memory pressure and
  smaller block sizes, e.g.:

  sudo mkdir -p /mnt/tmpfs /mnt/storage
  sudo mount -t tmpfs -o size=9000M tmpfs /mnt/tmpfs
  sudo dd if=/dev/urandom of=/mnt/tmpfs/ext4.img bs=1M
  sudo mkfs.ext4 -O large_dir -N 21000000 -O dir_index /mnt/tmpfs/ext4.img -b 
1024 -F
  sudo mount /mnt/tmpfs/ext4.img /mnt/storage

  and compile and run the attached C program (see
  
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1933074/+attachment/5509402/+files/touch.c)
  that quickly populates /mnt/storage with empty files.  Without the fix
  this will terminate with an -EEXIST or -EUCLEAN error on the file
  creation after several tens of thousands of files.

  [Where problems could occur]

  This changes the behaviour of the directory indexing hashing so there
  is a regression potential that this may introduce subsequent index
  hashing issues when needed (or not) to do a split.  This patch seems
  to cover all the necessary cases, so I believe this risk is relatively
  low.  I have also tested this on all the kernel series in the SRU with
  21,000,000 files so I am confident we have enough test coverage to
  show the fix is OK.

  ----------------------------------------------------------

  I believe, I found a bug in ext4 in recent kernel versions.
  I stumbled across this while I was trying to restore a backup to a new VM.

  How to reproduce this bug:

  1. Use a virtual/physical machine with "Ubuntu 18.04.5 LTS" and kernel 
version 4.15.0-144-generic.
  2. add a secondary disk to hold the test files.
  3. prepare and mount the filesystem with enabled 'large_dir' flag:
  mkfs.ext4 -m0 /dev/sdb1;
  tune2fs -O large_dir /dev/sdb1;
  mkdir /mnt/storage;
  mount /dev/sdb1 /mnt/storage;
  4. change to directory and create approx. 16 mio files
  cd /mnt/storage;
  i=0;
  while (( $i < 20000000 )); do
    i=$(( $i + 1 ));
    (( $i % 1000 == 0 )) && echo $i;
    touch file_$i.dat || break;
  done

  Expected behaviour:
  - 20 mio files shoud be created without error

  What happened instead:
  - The loop aborts with an error message:
  # 16263100
  # touch: cannot touch 'file_16263173.dat': Structure needs cleaning
  - dmesg gives a little more details:
  # [Mon Jun 21 03:15:18 2021] EXT4-fs error (device sdb): dx_probe:855: inode 
#2: block 146221: comm touch: directory leaf block found instead of index block

  Additional notes:
  - This occurs on kernel version 4.15.0-144-generic
  - Not sure, but I believe one test was run on 4.15.0-143-generic and failed 
too.
  - Did not check against 4.15.0-142-generic
  - On 4.15.0-141-generic, the problem does not exist. Behaviour is as expected.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1933074/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to