On Fri, Mar 05, 2021 at 09:48:50PM -0500, Theodore Ts'o wrote:
> I can't reproduce the problem given your file system image.  Given
> your description, this is almost certainly operator error.

OK, I was finally able to reproduce the problem, but not using your
reproduction instructions.  I reproduced it via:

1.  e2image -r rbd13.e2i.qcow2 /tmp/rbd13
2.  truncate -s 10T /tmp/rbd13
3.  e2fsck -fy /tmp/rbd13
4.  resize2fs /tmp/rbd13
5.  e2fsck -fy /tmp/rbd13
      <note fs corruption>

The bug report was also incorrect by saying that resizing the file
system to 4T was sufficient; that is not true.  It can only be
reproduced by when a file system is resized sufficiently large that
there is no longer enough room to grow the block group descriptors
without moving the allocation bitmaps and/or inode table out of the
way in order to create room for the block group descriptors.  (As in
the above reproduction recipe.)

% e2image -r rbd13.e2i.qcow2 /tmp/rbd13
e2image 1.46.2 (28-Feb-2021)
% truncate -s 4T /tmp/rbd13
% resize2fs  /tmp/rbd13
resize2fs 1.46.2 (28-Feb-2021)
Please run 'e2fsck -f /tmp/rbd13' first.
% e2fsck -f /tmp/rbd13
e2fsck 1.46.2 (28-Feb-2021)
e2fsck: MMP: e2fsck being run while checking MMP block
MMP check failed: If you are sure the filesystem is not in use on any node, run:
'tune2fs -f -E clear_mmp /tmp/rbd13'
MMP_block:
    mmp_magic: 0x4d4d50
    mmp_check_interval: 10
    mmp_sequence: e24d4d50
    mmp_update_date: Sat Mar  6 22:47:25 2021
    mmp_update_time: 1615088845
    mmp_node_name: cwcc
    mmp_device_name: /tmp/rbd13

/tmp/rbd13: ********** WARNING: Filesystem still has errors **********

This is a separate bug.  The issue here is that resize2fs is exiting
after printing the "Please run 'e2fsck -f /tmp/rbd13' first." without
cleanly stopping (resetting) the MMP protection.  And this doesn't
lead to file system corruption, as we can see here:

% tune2fs -f -E clear_mmp /tmp/rbd13
tune2fs 1.46.2 (28-Feb-2021)
% e2fsck -fy /tmp/rbd13
e2fsck 1.46.2 (28-Feb-2021)
Clearing orphaned inode 45617124 (uid=107, gid=115, mode=0100600, size=16777216)
Clearing orphaned inode 15073744 (uid=0, gid=0, mode=0100644, size=593696)
Clearing orphaned inode 15073743 (uid=0, gid=0, mode=0100644, size=3031904)
Clearing orphaned inode 50331709 (uid=0, gid=0, mode=0100644, size=149704)
Clearing orphaned inode 50332495 (uid=0, gid=0, mode=0100755, size=231560)
Clearing orphaned inode 50332319 (uid=0, gid=0, mode=0100644, size=2670992)
Clearing orphaned inode 50332271 (uid=0, gid=0, mode=0100644, size=651472)
Clearing orphaned inode 50332251 (uid=0, gid=0, mode=0100644, size=282752)
Clearing orphaned inode 13 (uid=0, gid=0, mode=0100600, size=0)
Pass 1: Checking inodes, blocks, and sizes
Inode 46530577 extent tree (at level 1) could be shorter.  Optimize? yes

Inode 46530714 extent tree (at level 1) could be shorter.  Optimize? yes

Pass 1E: Optimizing extent trees
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/tmp/rbd13: ***** FILE SYSTEM WAS MODIFIED *****
/tmp/rbd13: 225913/67108864 files (8.6% non-contiguous), 185961407/268435456 
blocks


The workaround to the first bug is to clear the MMP feature, do the
offline resize, and then enable the MMP feature again.  Of course,
while the MMP feature you won't be protected by another node trying to
modify the file system.  But it will allow you to grow the file system.

Another workaround is to simply do an online resize (that is, on the
node where the file system is mounted, run resize2fs on the mounted
file system).  However, depending on the kernel version, it won't
allow you to resize past the limits of the reserved block group
descriptors reserved by the resize inode.

                                                - Ted

Reply via email to