Re: Notes on support for multiple devices for a single filesystem

2008-12-18 Thread Bryan Henderson
 Features like the very nice and useful directory-based snapshots would
 also not be possible with simple block-based multi-devices, right?

Snapshotting via block device has always been an incredibly dumb hack, 
existing primarily because filesystem-based snapshots did not exist for 
the filesystem in question.

I can see that if the filesystem driver in question could already do 
snapshots, nobody would have added snapshot function to the block device 
driver under it, but this doesn't explain why someone at some time created 
block device snapshot instead of creating it for the filesystem in 
question.

Snapshots are better at the filesystem level because the filesystem is 
the only entity that knows when the filesystem is quiescent and 
snapshot-able.

You can use the same logic to say that snapshots are better at the 
application level because only the application knows when its database is 
quiescent and snapshot-able.  In fact, carrying it to the extreme, you 
could say snapshots are better done manually by the human end user with 
none of the computer knowing anything about it.

It probably minimizes engineering effort to have snapshot capability at 
every level, with the implementation at each level exploiting the function 
at the level below.  E.g. when someone tells a filesystem driver to 
snapshot a filesystem that resides on two block devices, the filesystem 
driver quiesces the filesystem, then snapshots each device (implemented in 
the block device driver), then resumes.  The new snapshot filesystem lives 
on the two new snapshot block devices.

Of course, if you want to do a form of snapshot that makes sense only in 
the context of a filesystem, like the directory snapshot mentioned above, 
then you can't get as much help from snapshot functions in the storage 
devices.

--
Bryan Henderson IBM Almaden Research Center
San Jose CA Storage Systems

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Christoph Hellwig
FYI: here's a little writeup I did this summer on support for
filesystems spanning multiple block devices:


-- 

=== Notes on support for multiple devices for a single filesystem ===

== Intro ==

Btrfs (and an experimental XFS version) can support multiple underlying block
devices for a single filesystem instances in a generalized and flexible way.

Unlike the support for external log devices in ext3, jfs, reiserfs, XFS, and
the special real-time device in XFS all data and metadata may be spread over a
potentially large number of block devices, and not just one (or two)


== Requirements ==

We want a scheme to support these complex filesystem topologies in way
that is

 a) easy to setup and non-fragile for the users
 b) scalable to a large number of disks in the system
 c) recoverable without requiring user space running first
 d) generic enough to work for multiple filesystems or other consumers

Requirement a) means that a multiple-device filesystem should be mountable
by a simple fstab entry (UUID/LABEL or some other cookie) which continues
to work when the filesystem topology changes.

Requirement b) implies we must not do a scan over all available block devices
in large systems, but use an event-based callout on detection of new block
devices.

Requirement c) means there must be some version to add devices to a filesystem
by kernel command lines, even if this is not the default way, and might require
additional knowledge from the user / system administrator.

Requirement d) means that we should not implement this mechanism inside a
single filesystem.


== Prior art ==

* External log and realtime volume

The most common way to specify the external log device and the XFS real time
device is to have a mount option that contains the path to the block special
device for it.  This variant means a mount option is always required, and
requires the device name doesn't change, which is enough with udev-generated
unique device names (/dev/disk/by-{label,uuid}).

An alternative way, supported by optionally by ext3 and reiserfs and
exclusively supported by jfs is to open the journal device by the device
number (dev_t) of the block special device.  While this doesn't require
an additional mount option when the device number is stored in the filesystem
superblock it relies on the device number being stable which is getting
increasingly unlikely in complex storage topologies.


* RAID (MD) and LVM

Software RAID and volume managers, although not strictly filesystems,
have a similar very similar problem finding their devices.  The traditional
solution used for early versions of the Linux MD driver and LVM version 1
was to hook into the partitions scanning code and add device with the
right partition type to a kernel-internal list of potential RAID / LVM
devices.  This approach has the advantage of being simple to implement,
fast, reliable and not requiring additional user space programs in the boot
process.  The downside is that it only works with specific partition table
formats that allow specifying a partition type, and doesn't work with
unpartitioned disks at all.  Recent MD setups and LVM2 thus move the scanning
to user space, typically using a command iterating over all block device
nodes and performing the format-specific scanning.  While this is more flexible
than the in-kernel scanning, it scales very badly to a large number of
block devices, and requires additional user space commands to run early
in the boot process.  A variant of this schemes runs a scanning callout
from udev once disk device are detected, which avoids the scanning overhead.


== High-level design considerations ==

Due to requirement b) we need a layer that finds devices for a single
fstab entry.  We can either do this in user space, or in kernel space. As we've
traditionally always done UUID/LABEL to device mapping in userspace, and we
already have libvolume_id and libblkid dealing with the specialized case
of UUID/LABEL to single device mapping I would recommend to keep doing
this in user space and try to reuse the libvolume_id / libblkid.

There are to options to perform the assembly of the device list for
a filesystem:

 1) whenever libvolume_id / libblkid find a device detected as a multi-device
capable filesystem it gets added to a list of all devices of this
particular filesystem type.
On mount type mount(8) or a mount.fstype helpers calls out to the
libraries to get a list of devices belonging to this filesystem
type and translates them to device names, which can be passed to
the kernel on the mount command line.

Advantage:  Requires a mount.fstype helper or fs-specific knowledge
in mount(8).
Disadvantages:  Required libvolume_id / libblkid to keep state.

 2) whenever libvolume_id / libblkid find a device detected as a multi-device
capable filesystem they call into the kernel through and ioctl / sysfs /
etc to add it to a list in kernel space.  The kernel code 

Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Andrew Morton
On Wed, 17 Dec 2008 08:23:44 -0500
Christoph Hellwig h...@infradead.org wrote:

 FYI: here's a little writeup I did this summer on support for
 filesystems spanning multiple block devices:
 
 
 -- 
 
 === Notes on support for multiple devices for a single filesystem ===
 
 == Intro ==
 
 Btrfs (and an experimental XFS version) can support multiple underlying block
 devices for a single filesystem instances in a generalized and flexible way.
 
 Unlike the support for external log devices in ext3, jfs, reiserfs, XFS, and
 the special real-time device in XFS all data and metadata may be spread over a
 potentially large number of block devices, and not just one (or two)
 
 
 == Requirements ==
 
 We want a scheme to support these complex filesystem topologies in way
 that is
 
  a) easy to setup and non-fragile for the users
  b) scalable to a large number of disks in the system
  c) recoverable without requiring user space running first
  d) generic enough to work for multiple filesystems or other consumers
 
 Requirement a) means that a multiple-device filesystem should be mountable
 by a simple fstab entry (UUID/LABEL or some other cookie) which continues
 to work when the filesystem topology changes.

device topology?

 Requirement b) implies we must not do a scan over all available block devices
 in large systems, but use an event-based callout on detection of new block
 devices.
 
 Requirement c) means there must be some version to add devices to a filesystem
 by kernel command lines, even if this is not the default way, and might 
 require
 additional knowledge from the user / system administrator.
 
 Requirement d) means that we should not implement this mechanism inside a
 single filesystem.
 

One thing I've never seen comprehensively addressed is: why do this in
the filesystem at all?  Why not let MD take care of all this and
present a single block device to the fs layer?

Lots of filesystems are violating this, and I'm sure the reasons for
this are good, but this document seems like a suitable place in which to
briefly decribe those reasons.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Chris Mason
On Wed, 2008-12-17 at 11:53 -0800, Andrew Morton wrote:
 On Wed, 17 Dec 2008 08:23:44 -0500
 Christoph Hellwig h...@infradead.org wrote:
 
  FYI: here's a little writeup I did this summer on support for
  filesystems spanning multiple block devices:
  
  
  -- 
  
  === Notes on support for multiple devices for a single filesystem ===
  
  == Intro ==
  
  Btrfs (and an experimental XFS version) can support multiple underlying 
  block
  devices for a single filesystem instances in a generalized and flexible way.
  
  Unlike the support for external log devices in ext3, jfs, reiserfs, XFS, and
  the special real-time device in XFS all data and metadata may be spread 
  over a
  potentially large number of block devices, and not just one (or two)
  
  
  == Requirements ==
  
  We want a scheme to support these complex filesystem topologies in way
  that is
  
   a) easy to setup and non-fragile for the users
   b) scalable to a large number of disks in the system
   c) recoverable without requiring user space running first
   d) generic enough to work for multiple filesystems or other consumers
  
  Requirement a) means that a multiple-device filesystem should be mountable
  by a simple fstab entry (UUID/LABEL or some other cookie) which continues
  to work when the filesystem topology changes.
 
 device topology?
 
  Requirement b) implies we must not do a scan over all available block 
  devices
  in large systems, but use an event-based callout on detection of new block
  devices.
  
  Requirement c) means there must be some version to add devices to a 
  filesystem
  by kernel command lines, even if this is not the default way, and might 
  require
  additional knowledge from the user / system administrator.
  
  Requirement d) means that we should not implement this mechanism inside a
  single filesystem.
  
 
 One thing I've never seen comprehensively addressed is: why do this in
 the filesystem at all?  Why not let MD take care of all this and
 present a single block device to the fs layer?
 
 Lots of filesystems are violating this, and I'm sure the reasons for
 this are good, but this document seems like a suitable place in which to
 briefly decribe those reasons.

I'd almost rather see this doc stick to the device topology interface in
hopes of describing something that RAID and MD can use too.  But just to
toss some information into the pool:

* When moving data around (raid rebuild, restripe, pvmove etc), we want
to make sure the data read off the disk is correct before writing it to
the new location (checksum verification).

* When moving data around, we don't want to move data that isn't
actually used by the filesystem.  This could be solved via new APIs, but
keeping it crash safe would be very tricky.

* When checksum verification fails on read, the FS should be able to ask
the raid implementation for another copy.  This could be solved via new
APIs.

* Different parts of the filesystem might want different underlying raid
parameters.  The easiest example is metadata vs data, where a 4k
stripesize for data might be a bad idea and a 64k stripesize for
metadata would result in many more rwm cycles.

* Sharing the filesystem transaction layer.  LVM and MD have to pretend
they are a single consistent array of bytes all the time, for each and
every write they return as complete to the FS.

By pushing the multiple device support up into the filesystem, I can
share the filesystem's transaction layer.  Work can be done in larger
atomic units, and the filesystem will stay consistent because it is all
coordinated.

There are other bits and pieces like high speed front end caching
devices that would be difficult in MD/LVM, but since I don't have that
coded yet I suppose they don't really count...

-chris


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Kay Sievers
On Wed, Dec 17, 2008 at 14:23, Christoph Hellwig h...@infradead.org wrote:
 === Notes on support for multiple devices for a single filesystem ===

 == Intro ==

 Btrfs (and an experimental XFS version) can support multiple underlying block
 devices for a single filesystem instances in a generalized and flexible way.

 Unlike the support for external log devices in ext3, jfs, reiserfs, XFS, and
 the special real-time device in XFS all data and metadata may be spread over a
 potentially large number of block devices, and not just one (or two)


 == Requirements ==

 We want a scheme to support these complex filesystem topologies in way
 that is

  a) easy to setup and non-fragile for the users
  b) scalable to a large number of disks in the system
  c) recoverable without requiring user space running first
  d) generic enough to work for multiple filesystems or other consumers

 Requirement a) means that a multiple-device filesystem should be mountable
 by a simple fstab entry (UUID/LABEL or some other cookie) which continues
 to work when the filesystem topology changes.

 Requirement b) implies we must not do a scan over all available block devices
 in large systems, but use an event-based callout on detection of new block
 devices.

 Requirement c) means there must be some version to add devices to a filesystem
 by kernel command lines, even if this is not the default way, and might 
 require
 additional knowledge from the user / system administrator.

 Requirement d) means that we should not implement this mechanism inside a
 single filesystem.


 == Prior art ==

 * External log and realtime volume

 The most common way to specify the external log device and the XFS real time
 device is to have a mount option that contains the path to the block special
 device for it.  This variant means a mount option is always required, and
 requires the device name doesn't change, which is enough with udev-generated
 unique device names (/dev/disk/by-{label,uuid}).

 An alternative way, supported by optionally by ext3 and reiserfs and
 exclusively supported by jfs is to open the journal device by the device
 number (dev_t) of the block special device.  While this doesn't require
 an additional mount option when the device number is stored in the filesystem
 superblock it relies on the device number being stable which is getting
 increasingly unlikely in complex storage topologies.


 * RAID (MD) and LVM

 Software RAID and volume managers, although not strictly filesystems,
 have a similar very similar problem finding their devices.  The traditional
 solution used for early versions of the Linux MD driver and LVM version 1
 was to hook into the partitions scanning code and add device with the
 right partition type to a kernel-internal list of potential RAID / LVM
 devices.  This approach has the advantage of being simple to implement,
 fast, reliable and not requiring additional user space programs in the boot
 process.  The downside is that it only works with specific partition table
 formats that allow specifying a partition type, and doesn't work with
 unpartitioned disks at all.  Recent MD setups and LVM2 thus move the scanning
 to user space, typically using a command iterating over all block device
 nodes and performing the format-specific scanning.  While this is more 
 flexible
 than the in-kernel scanning, it scales very badly to a large number of
 block devices, and requires additional user space commands to run early
 in the boot process.  A variant of this schemes runs a scanning callout
 from udev once disk device are detected, which avoids the scanning overhead.


 == High-level design considerations ==

 Due to requirement b) we need a layer that finds devices for a single
 fstab entry.  We can either do this in user space, or in kernel space. As 
 we've
 traditionally always done UUID/LABEL to device mapping in userspace, and we
 already have libvolume_id and libblkid dealing with the specialized case
 of UUID/LABEL to single device mapping I would recommend to keep doing
 this in user space and try to reuse the libvolume_id / libblkid.

 There are to options to perform the assembly of the device list for
 a filesystem:

  1) whenever libvolume_id / libblkid find a device detected as a multi-device
capable filesystem it gets added to a list of all devices of this
particular filesystem type.
On mount type mount(8) or a mount.fstype helpers calls out to the
libraries to get a list of devices belonging to this filesystem
type and translates them to device names, which can be passed to
the kernel on the mount command line.

Advantage:  Requires a mount.fstype helper or fs-specific knowledge
in mount(8).
Disadvantages:  Required libvolume_id / libblkid to keep state.

  2) whenever libvolume_id / libblkid find a device detected as a multi-device
capable filesystem they call into the kernel through and ioctl / sysfs /
etc to add it to a list 

Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Kay Sievers
On Wed, Dec 17, 2008 at 21:58, Chris Mason chris.ma...@oracle.com wrote:
 On Wed, 2008-12-17 at 11:53 -0800, Andrew Morton wrote:

 One thing I've never seen comprehensively addressed is: why do this in
 the filesystem at all?  Why not let MD take care of all this and
 present a single block device to the fs layer?

 Lots of filesystems are violating this, and I'm sure the reasons for
 this are good, but this document seems like a suitable place in which to
 briefly decribe those reasons.

 I'd almost rather see this doc stick to the device topology interface in
 hopes of describing something that RAID and MD can use too.  But just to
 toss some information into the pool:

 * When moving data around (raid rebuild, restripe, pvmove etc), we want
 to make sure the data read off the disk is correct before writing it to
 the new location (checksum verification).

 * When moving data around, we don't want to move data that isn't
 actually used by the filesystem.  This could be solved via new APIs, but
 keeping it crash safe would be very tricky.

 * When checksum verification fails on read, the FS should be able to ask
 the raid implementation for another copy.  This could be solved via new
 APIs.

 * Different parts of the filesystem might want different underlying raid
 parameters.  The easiest example is metadata vs data, where a 4k
 stripesize for data might be a bad idea and a 64k stripesize for
 metadata would result in many more rwm cycles.

 * Sharing the filesystem transaction layer.  LVM and MD have to pretend
 they are a single consistent array of bytes all the time, for each and
 every write they return as complete to the FS.

 By pushing the multiple device support up into the filesystem, I can
 share the filesystem's transaction layer.  Work can be done in larger
 atomic units, and the filesystem will stay consistent because it is all
 coordinated.

 There are other bits and pieces like high speed front end caching
 devices that would be difficult in MD/LVM, but since I don't have that
 coded yet I suppose they don't really count...

Features like the very nice and useful directory-based snapshots would
also not be possible with simple block-based multi-devices, right?

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Chris Mason
On Wed, 2008-12-17 at 22:20 +0100, Kay Sievers wrote:
 On Wed, Dec 17, 2008 at 21:58, Chris Mason chris.ma...@oracle.com wrote:
  On Wed, 2008-12-17 at 11:53 -0800, Andrew Morton wrote:
 
  There are other bits and pieces like high speed front end caching
  devices that would be difficult in MD/LVM, but since I don't have that
  coded yet I suppose they don't really count...
 
 Features like the very nice and useful directory-based snapshots would
 also not be possible with simple block-based multi-devices, right?

At least for btrfs, the snapshotting is independent from the
multi-device code, and you still get snapshotting on single device
filesystems.

-chris


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Jeff Garzik

Kay Sievers wrote:

Features like the very nice and useful directory-based snapshots would
also not be possible with simple block-based multi-devices, right?


Snapshotting via block device has always been an incredibly dumb hack, 
existing primarily because filesystem-based snapshots did not exist for 
the filesystem in question.


Snapshots are better at the filesystem level because the filesystem is 
the only entity that knows when the filesystem is quiescent and 
snapshot-able.


ISTR we had to add -write_super_lockfs() to hack in support for LVM in 
this manner, rather than doing it the right way.


Jeff


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Chris Mason
On Wed, 2008-12-17 at 14:24 -0700, Andreas Dilger wrote:

 I can't speak for btrfs, but I don't think multiple device access from
 the filesystem is a layering violation as some people comment.  It
 is
 just a different type of layering.  With ZFS there is a distinct layer
 that is handling the allocation, redundancy, and transactions (SPA,
 DMU)
 that is exporting an object interface, and the filesystem (ZPL, or
 future
 versions of Lustre) is built on top of that object interface.

Clean interfaces aren't really my best talent, but btrfs also layers
this out.  logical-physical mappings happen in a centralized function,
and all of the on disk structures use logical block numbers.

The only exception to that rule is the superblock offsets on the device.

-chris


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Andreas Dilger
On Dec 17, 2008  08:23 -0500, Christoph Hellwig wrote:
 == Prior art ==
 
 * External log and realtime volume
 
 The most common way to specify the external log device and the XFS real time
 device is to have a mount option that contains the path to the block special
 device for it.  This variant means a mount option is always required, and
 requires the device name doesn't change, which is enough with udev-generated
 unique device names (/dev/disk/by-{label,uuid}).
 
 An alternative way, supported by optionally by ext3 and reiserfs and
 exclusively supported by jfs is to open the journal device by the device
 number (dev_t) of the block special device.  While this doesn't require
 an additional mount option when the device number is stored in the filesystem
 superblock it relies on the device number being stable which is getting
 increasingly unlikely in complex storage topologies.

Just as an FYI here - the dev_t stored in the ext3/4 superblock for the
journal device is only a cached device.  The journal is properly
identified by its UUID, and should the device mapping change there is a
journal_dev= option that can be used to specify the new device.  The
one shortcoming is that there is no mount.ext3 helper which does this 
journal UUID-dev mapping and automatically passes journal_dev= if
needed.

 * RAID (MD) and LVM
 
 Recent MD setups and LVM2 thus move the scanning to user space, typically
 using a command iterating over all block device nodes and performing the
 format-specific scanning.  While this is more flexible
 than the in-kernel scanning, it scales very badly to a large number of
 block devices, and requires additional user space commands to run early
 in the boot process.  A variant of this schemes runs a scanning callout
 from udev once disk device are detected, which avoids the scanning overhead.

My (admittedly somewhat vague) impression is that with large numbers of
devices the udev callout can itself be a huge overhead because this involves
a userspace fork/exec for each new device being added.  For the same
number of devices, a single scan from userspace only requires a single
process, and an equal number of device probes.

Added to this is that the blkid cache can be used to eliminate the need
to do any scanning if the devices have not changed from the previous boot
makes it unclear which mechanism is more efficient.  The drawback is that
the initrd device cache is never going to be up-to-date so it wouldn't
be useful until the root partition is mounted.

We've used blkid for our testing of Lustre-on-DMU with up to 48 (local)
disks w/o any kind of performance issues.  We'll eventually be able to
test on systems with around 400 disks in a JBOD configuration, but until
then we only run on systems with hundreds of disks behind a RAID controller.

 == High-level design considerations ==
 
 Due to requirement b) we need a layer that finds devices for a single
 fstab entry.  We can either do this in user space, or in kernel space.
 As we've traditionally always done UUID/LABEL to device mapping in
 userspace, and we already have libvolume_id and libblkid dealing with
 the specialized case of UUID/LABEL to single device mapping I would
 recommend to keep doing this in user space and reuse libvolume_id/libblkid.
 
 There are to options to perform the assembly of the device list for
 a filesystem:
 
  1) whenever libvolume_id / libblkid find a device detected as a multi-device
 capable filesystem it gets added to a list of all devices of this
 particular filesystem type.
 On mount type mount(8) or a mount.fstype helpers calls out to the
 libraries to get a list of devices belonging to this filesystem
 type and translates them to device names, which can be passed to
 the kernel on the mount command line.

I would actually suggest that instead of keeping devices in groups by
the filesystem type, rather keep a list of devices with the same UUID
and/or LABEL, and if the mount is looking for this UUID/LABEL it gets
the whole list of matching devices back.

This could also be done in the kernel by having the filesystems register
a probe function that examines the device/partitions as they are added,
similar to the way that MD used to do it.  There would likely be very few
probe functions needed, only ext3/4 (for journal devices), btrfs, and
maybe MD, LVM2 and a handful more.

If we wanted to avoid code duplication, this could share code between
libblkid and the kernel (just the enhanced probe-only functions in the
util-linux-ng implementation) since these functions are little more than
take a pointer, cast it to struct X, check some magic fields and return
match + {LABEL, UUID}, or no-match.

That MD used to check only the partition type doesn't mean that we can't
have simple functions that read the superblock (or equivalent) to make
an internal list of suitable devices attached to a filesystem-type global
structure (possibly split into per-fsUUID sublists if it wants).  

Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Dave Kleikamp
On Wed, 2008-12-17 at 15:04 -0700, Andreas Dilger wrote:
 On Dec 17, 2008  08:23 -0500, Christoph Hellwig wrote:

  An alternative way, supported by optionally by ext3 and reiserfs and
  exclusively supported by jfs is to open the journal device by the device
  number (dev_t) of the block special device.  While this doesn't require
  an additional mount option when the device number is stored in the 
  filesystem
  superblock it relies on the device number being stable which is getting
  increasingly unlikely in complex storage topologies.
 
 Just as an FYI here - the dev_t stored in the ext3/4 superblock for the
 journal device is only a cached device.  The journal is properly
 identified by its UUID, and should the device mapping change there is a
 journal_dev= option that can be used to specify the new device.  The
 one shortcoming is that there is no mount.ext3 helper which does this 
 journal UUID-dev mapping and automatically passes journal_dev= if
 needed.

An additional FYI.  JFS also treats the dev_t in its superblock the same
way.  Since jfs relies on jfs_fsck running at boot time to ensure that
the journal is replayed, jfs_fsck makes sure that the dev_t is accurate.
If not, then it scans all of the block devices until it finds the uuid
of the journal device, updating the superblock so that the kernel will
find the journal.

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html