Re: Add a norecovery option to ext3/4?

2007-04-09 Thread Andreas Dilger
On Apr 08, 2007  22:24 -0500, Eric Sandeen wrote:
 Samuel Thibault wrote:
 Distribution installers usually try to probe OSes for building a suited
 grub menu.  Unfortunately, mounting an ext3 partition, even in read-only
 mode, does perform some operations on the filesystem (log recovery).
 This is not a good idea since it may silently garbage data.  
 
 Can you elaborate?  Under what circumstances is log replay going to harm 
 data?  Do you mean that the installer mounts partitions, looking for 
 what OS is installed?  How is that harmful?

If that disk was actually in use on another system but just exported
via a SAN to this node you've potentially corrupted the filesystem.

It's a bad idea to just go ahead and mount filesystems that you aren't
told to mount.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Add a norecovery option to ext3/4?

2007-04-09 Thread Theodore Tso
On Sun, Apr 08, 2007 at 10:42:03PM -0500, Eric Sandeen wrote:
 Samuel Thibault wrote:
 
 Hm, so the root cause there seems that the installer found 2 legs of a 
 mirror and mounted them independently, recovering them independently... 
 But why did that cause problems?
 
 Because that thrashed his data (or at least it didn't help to keep data
 safe).

Actually, reading through the Debian bug report, there is no proof
that is what actually caused the data loss.  I certainly can't think
of any explanation for why that would have happened.  See the summary
from Steve Langasek::

Checkpoint of the IRC discussion:

- The submitter says that after reboot, the RAID was reported as out of
  sync.
- The logs show that the ext3 filesystem was automatically mounted rw for
  journal recovery by the kernel driver.
- There is no evidence in the logs that the RAID was ever assembled within
  d-i, so it shouldn't be the case that the RAID superblocks were out of
  sync as a result of d-i itself.
- This leaves two possible reasons for the out-of-sync state of the RAID:
  either mounting the individual partitions as ext3 filesystems somehow
  overwrote the RAID superblock just the right way (unlikely since it would
  require the ext3 driver to write past the end of the declared filesystem),
  or the RAID superblocks were out of sync /before/ booting d-i.  The latter
  is consistent with the fact that the ext3 driver had to do a journal
  recovery, suggesting that both the ext3 fs and the RAID were not cleanly
  shut down.
- If mounting as ext3 overwrote the RAID superblock, that seems to be a
  kernel bug, and we have no good explanation for how that would happen.
- If the RAID was unclean before booting d-i, all bets are off as to the
  state of the filesystem at the beginning of this journal recovery, and it
  may be difficult to ever reproduce this bug.

 The reason I suggest other options is because intentionally mounting a 
 corrupted FS may not really be the way you want to go... norecovery on 
 xfs at least is an option of last resort, not something to use by default.

This would also be true for ext3; I am extremely uncomfortable with
people thinking that a norecovery option is something that should be
routinely used by programs.  It's something that should only be used
by experts, who know what they are doing and who are willing to accept
the potential risks.

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Add a norecovery option to ext3/4?

2007-04-09 Thread Valdis . Kletnieks
On Sun, 08 Apr 2007 22:24:50 CDT, Eric Sandeen said:
 Can you elaborate?  Under what circumstances is log replay going to harm 
 data?  Do you mean that the installer mounts partitions, looking for 
 what OS is installed?  How is that harmful?

Another usage case that really wants to avoid the log replay is if you're
looking at an unknown disk image with a forensics CD such as Helix:

http://www.e-fense.com/helix/

Yes, good forensics always clones the disk image twice (the first clone being
used for nothing but creating second-gen clones for analysis), and in most
cases the forensic analyst can work around the fact that you *do* cause some
changes to the disk image by mounting.  But sometimes, you'd rather be looking
at a possibly inconsistent image than replaying the log  - particularly if
you're looking at a seized and power plug pulled image, and you actually
care about things that may have been in the log, like just-erased files. 


pgp1Pvm62bv9j.pgp
Description: PGP signature


Re: confused on different inode size

2007-04-09 Thread Theodore Tso
On Mon, Apr 09, 2007 at 10:33:13AM +0800, coly wrote:
 Theodore:
 
 Thanks for your replying. 
 
 Can I understand this way:
 * Though sizeof(struct ext4_inode) is 152, the real inode size on disk
 still depends on mount options.

Not mount options, but how the filesystem is formatted.  So substitute
mount with mke2fs, and that would be correct.

 * If use old inode size, the on disk inode will be 128 bytes.
 * If use new inode size(e.g. extent option in mount), the on disk inode
 will be 256, or more bytes.

s/mount/mke2fs/

And the on-disk inode size is 256, 512, or some greater power of two,
up to the filesystem blocksize.

 * If on disk inode size is 128 bytes, only first 128 bytes of struct
 ext4_inode take effects.

Well, there's no space to store the fields beyond the first 128, so
any features that require the extra inode fields can't be used.

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Interface for the new fallocate() system call

2007-04-09 Thread Paul Mackerras
Jörn Engel writes:

 Wouldn't that work be confined to fallocate()?  If I understand Heiko
 correctly, the alternative would slow s390 down for every syscall,
 including more performance-critical ones.

The alternative that Jakub suggested wouldn't slow s390 down.

Paul.
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: confused on different inode size

2007-04-09 Thread coly
Theodore:

Thanks for your explaining. I ignored this detail before, it is more
clear to me now.

Best regards.

Coly

在 2007-04-09一的 11:26 -0400,Theodore Tso写道:
 On Mon, Apr 09, 2007 at 10:33:13AM +0800, coly wrote:
  Theodore:
  
  Thanks for your replying. 
  
  Can I understand this way:
  * Though sizeof(struct ext4_inode) is 152, the real inode size on disk
  still depends on mount options.
 
 Not mount options, but how the filesystem is formatted.  So substitute
 mount with mke2fs, and that would be correct.
 
  * If use old inode size, the on disk inode will be 128 bytes.
  * If use new inode size(e.g. extent option in mount), the on disk inode
  will be 256, or more bytes.
 
 s/mount/mke2fs/
 
 And the on-disk inode size is 256, 512, or some greater power of two,
 up to the filesystem blocksize.
 
  * If on disk inode size is 128 bytes, only first 128 bytes of struct
  ext4_inode take effects.
 
 Well, there's no space to store the fields beyond the first 128, so
 any features that require the extra inode fields can't be used.
 
   - Ted

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Add a norecovery option to ext3/4?

2007-04-09 Thread Phillip Susi

Samuel Thibault wrote:

Hi,

Distribution installers usually try to probe OSes for building a suited
grub menu.  Unfortunately, mounting an ext3 partition, even in read-only
mode, does perform some operations on the filesystem (log recovery).
This is not a good idea since it may silently garbage data.  XFS has a
norecovery option that allows to disable that, I'd say ext3/4 should
have it too.


When the filesystem is told to mount the disk read only, that means it 
should not write to it.  The fact that ext3 goes ahead and does anyway 
is a bug and should be fixed.  There is no need for a norecovery option, 
because read only is a sufficient directive to tell the filesystem not 
to write to the disk.


As someone else pointed out, this behavior causes havoc if you hibernate 
a system and then boot up another system which mounts the disk of the 
hibernated system.  Under all conditions it should be safe to mount a 
disk read only, but here it is not because the journal playback trashes 
the disk out from under the hibernated system.



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Add a norecovery option to ext3/4?

2007-04-09 Thread Kyle Moffett

On Apr 09, 2007, at 11:43:15, Phillip Susi wrote:

Samuel Thibault wrote:

Hi,
Distribution installers usually try to probe OSes for building a  
suited grub menu.  Unfortunately, mounting an ext3 partition, even  
in read-only mode, does perform some operations on the filesystem  
(log recovery).  This is not a good idea since it may silently  
garbage data.  XFS has a norecovery option that allows to disable  
that, I'd say ext3/4 should have it too.


When the filesystem is told to mount the disk read only, that means  
it should not write to it.  The fact that ext3 goes ahead and does  
anyway is a bug and should be fixed.  There is no need for a  
norecovery option, because read only is a sufficient directive to  
tell the filesystem not to write to the disk.


As someone else pointed out, this behavior causes havoc if you  
hibernate a system and then boot up another system which mounts the  
disk of the hibernated system.  Under all conditions it should be  
safe to mount a disk read only, but here it is not because the  
journal playback trashes the disk out from under the hibernated  
system.


Well IIRC it is possible to prevent that by switching the blockdev to  
read-only mode first:


[EMAIL PROTECTED]:~# mount /dev/hda6 /mnt
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hda6, internal journal
EXT3-fs: mounted filesystem with ordered data mode
[EMAIL PROTECTED]:~# umount /mnt
[EMAIL PROTECTED]:~# blockdev --setro /dev/hda6
[EMAIL PROTECTED]:~# mount /dev/hda6 /mnt
mount: block device /dev/loop0 is write-protected, mounting read-only
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode

Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Add a norecovery option to ext3/4?

2007-04-09 Thread Jan Engelhardt

On Apr 8 2007 22:24, Eric Sandeen wrote:
 Samuel Thibault wrote:

 Can you elaborate?  Under what circumstances is log replay going to harm data?
 Do you mean that the installer mounts partitions, looking for what OS is
 installed?  How is that harmful?

 Hm, so the root cause there seems that the installer found 2 legs of a mirror
 and mounted them independently, recovering them independently... But why did
 that cause problems?

Because, for whatever unlikely reason there could possibly be, it may
have been repaired differently [depending on sunshine, daytime, rand(), 
or so]?


Jan
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html