Bug#319002: e2fsprogs: tune2fs -O +has_journal on a mounted fs created a corrupt fs

Ariel Tue, 19 Jul 2005 15:16:33 -0700


On Tue, 19 Jul 2005, Theodore Ts'o wrote:

On Tue, Jul 19, 2005 at 02:46:42AM -0400, Ariel wrote:

Package: e2fsprogs
Version: 1.37-2
Severity: important

This was running on a system with a ro / fs changed via remount to rw.
(So /etc/mtab was not real, since nothing rewrote it afterward.)
I don't know if this is relevant - since I did the exact same thing in
the same session to another fs, and that worked fine, created a .journal
file.


This was the root cause.  /etc/mtab has to be real, because tune2fs
needs to know whether or not the device is mounted.  Actually, the
library will /proc/mounts first, so presuambly you didn't have /proc
mounted either.  If you don't have /proc mounted, but you are mounting
disks and then trying to use tune2fs to add journals, you're just
asking for trouble.

I did have /proc mounted! And it worked fine in the same session on another fs. Also, can't the tools tell if a fs is mounted by looking in the superblock (i.e. it's dirty?)

And, BTW it created a .journal file (as you can see in the fsck) so it must have known it was mounted.

It sounds like you were trying to add a journal very early in the
process, in single-user mode.

That's correct. I set SULOGIN in /etc/default/rcS and that gets me a shell where I do this sort of tasks.

Why were you doing this and having /dev/md1 mounted in the first place?

I was experimenting a bit with journaling - I was trying to create a filesystem with a .journal file, rather then a hidden inode.

I am aware I could have done this safer/easier - I just wanted to report the bug because I wasn't doing anything not supported. (All I did was mount the fs, and -j it, nothing exotic, except that /etc/mtab was probably weird - which maybe counts as exotic, but /proc/mounts was there.)

We can make this better under Linux 2.6, since with 2.6 there is a way
that we can detect that device is busy, and if /etc/mtab is bogus, we
can detect that the device is busy, and cause tune2fs to abort in that
case.

This was under 2.6.9. It was a raid device, could that have confused the mount detection?

a: why did tune2fs mess up


See above.

This is a little bit better in e2fsprogs 1.38 if you are running on a
Linux 2.6 kernel, under 2.6 we can directly detect whether or not the
filesystem is mounted,

b: you'll notice I had to run fsck twice to get a correct fs

I can't replicate this.  The first fsck, if it cleared .journal
because it was deleted (i_links_count == 0), it should complained much
earlier if the superblock had the journal inode set to the inode
number of .journal.

The only way I could have imagined this happened is if the filesystem
was mounted when you ran e2fsck on it.  Was that the case?  Normally
e2fsck will complain loudly, but you were running with a bogus
/etc/mtab file without /proc mounted.....

It was not mounted - I rebooted at this point, and it doesn't mount /boot by default, and I certainly didn't mount it.

Is there any combination of inode parameters that could cause this? (i.e. fsck needs to be run twice, never mind how the fs got there?) It looks to my uneducated eye that the only real error is that i_links_count was 0 instead of 1, change just that and all the other errors would have gone away.

I tested this using debugfs to change the link count. And I got an even stranger result: fsck.ext3 finds no errors, but I can't mount it!


# mount -o loop d /mnt
mount: wrong fs type, bad option, bad superblock on /dev/loop1,
       or too many mounted file systems

Here is exactly what I did to create this umountable fs:

# cp /dev/md1 d
# tune2fs -O ^has_journal d
tune2fs 1.37 (21-Mar-2005)
# mount -o loop d /mnt
# losetup -a
/dev/loop1: [fd01]:81924 (d)
# tune2fs -j /dev/loop1
tune2fs 1.37 (21-Mar-2005)
Creating journal inode: done
This filesystem will be automatically checked every 21 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.
# umount d
# cp d d1
# debugfs -w d1
debugfs 1.37 (21-Mar-2005)
debugfs:  mi .journal
                          Mode    [0100600]
                       User ID    [0]
                      Group ID    [0]
                          Size    [1048576]
                 Creation time    [1121808381]
             Modification time    [1121808381]
                   Access time    [1121808381]
                 Deletion time    [0]
                    Link count    [1] 0
                   Block count    [2058]
                    File flags    [0x50]
                    Generation    [0x69e95ea8]
                      File acl    [0]
           High 32bits of size    [0]
              Fragment address    [0]
               Fragment number    [0]
                 Fragment size    [0]
               Direct Block #0    [5292]
               Direct Block #1    [5293]
               Direct Block #2    [5294]
               Direct Block #3    [5295]
               Direct Block #4    [5296]
               Direct Block #5    [6437]
               Direct Block #6    [6438]
               Direct Block #7    [6439]
               Direct Block #8    [6440]
               Direct Block #9    [6441]
              Direct Block #10    [6442]
              Direct Block #11    [6443]
                Indirect Block    [6444]
         Double Indirect Block    [6701]
         Triple Indirect Block    [0]
debugfs: quit
# fsck.ext3 -f d1
e2fsck 1.37 (21-Mar-2005)
Backing up journal inode block information.

Pass 1: Checking inodes, blocks, and sizes
Deleted inode 18 has zero dtime.  Fix<y>? yes

Pass 2: Checking directory structure
Entry '.journal' in / (2) has deleted/unused inode 18.  Clear<y>? yes

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -(5292--5296) -(6437--7215) -(7237--7481)
Fix<y>? yes

Free blocks count wrong for group #0 (2495, counted=3524).
Fix<y>? yes

Free blocks count wrong (8986, counted=10015).
Fix<y>? yes

Inode bitmap differences:  -18
Fix<y>? yes

Free inodes count wrong for group #0 (4, counted=5).
Fix<y>? yes

Free inodes count wrong (76, counted=77).
Fix<y>? yes


/boot: ***** FILE SYSTEM WAS MODIFIED *****
/boot: 43/120 files (18.6% non-contiguous), 13089/23104 blocks
# fsck.ext3 -f d1
e2fsck 1.37 (21-Mar-2005)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/boot: 43/120 files (18.6% non-contiguous), 13089/23104 blocks
# mount -o loop d1 /mnt
mount: wrong fs type, bad option, bad superblock on /dev/loop1,
       or too many mounted file systems
       (could this be the IDE device where you in fact use
       ide-scsi so that sr0 or sda or so is needed?)

I ran e2image -r d - | bzip2 > d.bz2 and on d1 have attached them. (10KB each.)

I also tried it with a new empty filesystem I created, and that worked properly:

Superblock has a bad ext3 journal (inode 12).
Clear<y>? yes

c: the kernel should probably have reacted a little more sensibly to the
error - i.e. don't send zillions of identical messages - kick out the
fs, maybe return an error to the process trying to read, but not tons of
useless messages that effectively froze the machine. (I checked and it was
mounted errors=continue, since that's the default.)


Because of (a), tune2fs wrote directly to a mounted filesystem, and
that results in enough filesystem corruption that the kernel was going
to start complaining pretty loudly.  We'd need to see what the
messages were, but in any case, that's not an e2fsprogs issue....

Sorry, I should report this to the kernel guys. But I still am quite sure tune2fs knew that the fs was mounted.


        -Ariel

d.bz2
Description: Binary data

d1.bz2
Description: Binary data

Bug#319002: e2fsprogs: tune2fs -O +has_journal on a mounted fs created a corrupt fs

Reply via email to