Re: Q: Why subvolumes?

2013-08-04 Thread Alexandre Oliva
On Jul 23, 2013, Jerome Haltom was...@cogito.cx wrote:

 Why not just create the new dev_id on the destination snapshot of any
 directory? That way the snapshot can share inodes with is source.

Agreed.  Nothing stops us from implementing snapshotting of any
directory whatsoever: all it takes is to take a snapshot of the
subvolume enclosing the directory we want to snapshot, removing
everything that's not in the requested directory from the snapshot, and
making that directory the root of the snapshot.  The only tricky bit
here AFAICT is to arrange for the non-snapshotted subtree components to
be cleaned up in background.  If we had some primitive to unlink an
entire subtree and clean it up in background we could use that.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Q: Why subvolumes?

2013-07-23 Thread Jerome Haltom
May I ask why the decision to implement snapshotting through
subvolumes? I've been very curious about why the design wasn't to
simply allow snapshotting of any directory or file.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Q: Why subvolumes?

2013-07-23 Thread Gabriel de Perthuis
Now... since the snapshot's FS tree is a direct duplicate of the
 original FS tree (actually, it's the same tree, but they look like
 different things to the outside world), they share everything --
 including things like inode numbers. This is OK within a subvolume,
 because we have the semantics that subvolumes have their own distinct
 inode-number spaces. If we could snapshot arbitrary subsections of the
 FS, we'd end up having to fix up inode numbers to ensure that they
 were unique -- which can't really be an atomic operation (unless you
 want to have the FS locked while the kernel updates the inodes of the
 billion files you just snapshotted).

I don't think so; I just checked some snapshots and the inos are the same.
Btrfs just changes the dev_id of subvolumes (somehow the vfs allows this).

The other thing to talk about here is that while the FS tree is a
 tree structure, it's not a direct one-to-one map to the directory tree
 structure. In fact, it looks more like a list of inodes, in inode
 order, with some extra info for easily tracking through the list. The
 B-tree structure of the FS tree is just a fast indexing method. So
 snapshotting a directory entry within the FS tree would require
 (somehow) making an atomic copy, or CoW copy, of only the parts of the
 FS tree that fall under the directory in question -- so you'd end up
 trying to take a sequence of records in the FS tree, of arbitrary size
 (proportional roughly to the number of entries in the directory) and
 copying them to somewhere else in the same tree in such a way that you
 can automatically dereference the copies when you modify them. So,
 ultimately, it boils down to being able to do CoW operations at the
 byte level, which is going to introduce huge quantities of extra
 metadata, and it all starts looking really awkward to implement (plus
 having to deal with the long time taken to copy the directory entries
 for the thing you're snapshotting).

Btrfs already does CoW of arbitrarily-large files (extent lists);
doing the same for directories doesn't seem impossible.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Q: Why subvolumes?

2013-07-23 Thread Hugo Mills
On Tue, Jul 23, 2013 at 07:47:41PM +0200, Gabriel de Perthuis wrote:
 Now... since the snapshot's FS tree is a direct duplicate of the
  original FS tree (actually, it's the same tree, but they look like
  different things to the outside world), they share everything --
  including things like inode numbers. This is OK within a subvolume,
  because we have the semantics that subvolumes have their own distinct
  inode-number spaces. If we could snapshot arbitrary subsections of the
  FS, we'd end up having to fix up inode numbers to ensure that they
  were unique -- which can't really be an atomic operation (unless you
  want to have the FS locked while the kernel updates the inodes of the
  billion files you just snapshotted).
 
 I don't think so; I just checked some snapshots and the inos are the same.
 Btrfs just changes the dev_id of subvolumes (somehow the vfs allows this).

   That's what I said. Our current implementation allows different
subvolumes to have the same inode numbers, which is what makes it
work. If you threw out the concept of subvolumes, or allowed snapshots
within subvolumes, then you'd be duplicating inodes within a
subvolume, which is one reason it doesn't work.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Unix: For controlling fungal diseases in crops. --- 


signature.asc
Description: Digital signature


Re: Q: Why subvolumes?

2013-07-23 Thread Gabriel de Perthuis
Le mar. 23 juil. 2013 21:30:13 CEST, Hugo Mills a écrit :
 On Tue, Jul 23, 2013 at 07:47:41PM +0200, Gabriel de Perthuis wrote:
Now... since the snapshot's FS tree is a direct duplicate of the
 original FS tree (actually, it's the same tree, but they look like
 different things to the outside world), they share everything --
 including things like inode numbers. This is OK within a subvolume,
 because we have the semantics that subvolumes have their own distinct
 inode-number spaces. If we could snapshot arbitrary subsections of the
 FS, we'd end up having to fix up inode numbers to ensure that they
 were unique -- which can't really be an atomic operation (unless you
 want to have the FS locked while the kernel updates the inodes of the
 billion files you just snapshotted).

 I don't think so; I just checked some snapshots and the inos are the same.
 Btrfs just changes the dev_id of subvolumes (somehow the vfs allows this).

That's what I said. Our current implementation allows different
 subvolumes to have the same inode numbers, which is what makes it
 work. If you threw out the concept of subvolumes, or allowed snapshots
 within subvolumes, then you'd be duplicating inodes within a
 subvolume, which is one reason it doesn't work.

Sorry for misreading you.
Directory snapshots can work by giving a new device number to the snapshot.
There is no need to update inode numbers in that case.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Q: Why subvolumes?

2013-07-23 Thread Jerome Haltom
Why not just create the new dev_id on the destination snapshot of any
directory? That way the snapshot can share inodes with is source.

On Tue, Jul 23, 2013 at 2:30 PM, Hugo Mills h...@carfax.org.uk wrote:
 On Tue, Jul 23, 2013 at 07:47:41PM +0200, Gabriel de Perthuis wrote:
 Now... since the snapshot's FS tree is a direct duplicate of the
  original FS tree (actually, it's the same tree, but they look like
  different things to the outside world), they share everything --
  including things like inode numbers. This is OK within a subvolume,
  because we have the semantics that subvolumes have their own distinct
  inode-number spaces. If we could snapshot arbitrary subsections of the
  FS, we'd end up having to fix up inode numbers to ensure that they
  were unique -- which can't really be an atomic operation (unless you
  want to have the FS locked while the kernel updates the inodes of the
  billion files you just snapshotted).

 I don't think so; I just checked some snapshots and the inos are the same.
 Btrfs just changes the dev_id of subvolumes (somehow the vfs allows this).

That's what I said. Our current implementation allows different
 subvolumes to have the same inode numbers, which is what makes it
 work. If you threw out the concept of subvolumes, or allowed snapshots
 within subvolumes, then you'd be duplicating inodes within a
 subvolume, which is one reason it doesn't work.

Hugo.

 --
 === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
   PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Unix: For controlling fungal diseases in crops. ---
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Q: Why subvolumes?

2013-07-23 Thread Chris Murphy

On Jul 23, 2013, at 1:43 PM, Jerome Haltom was...@cogito.cx wrote:

 Why not just create the new dev_id on the destination snapshot of any
 directory?

Right now, snapshots of subvolumes do not contain the contents of contained 
subvolumes. Hmmm, that sounds horrid.

Subvolume A
File 1
File 2
Subvolume B
File 3
File 4

If I snapshot subvolume A, the resulting snapshot does not contain File 3 and 
File 4. Subvolume B is a regular folder in the snapshot of Subvolume A.

So if every directory were a subvolume by default, this limitation would need 
to be resolved or snapshotting would become useless. I'm sure there's a more 
coherent explanation why this isn't desired.

 That way the snapshot can share inodes with is source.


Snapshots already share inode numbers.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Q: Why subvolumes?

2013-07-23 Thread Jerome Haltom
Yeah. I was merely curious about the architecture limits that drove
the design this way, to begin with. Mostly because it seems odd. It
seems like the most obvious and most natural thing from the user's
perspective to do would just be able to reflink directories. Like
every decent source control system that exists, for instance. So, I
figured there must be some very good reason it wasn't done like that.

I'm still not completely sure what that very good reason is. Obviously
whatever structure that currently exists for subvolumes would need to
continue existing, to begin a unique inode scope. But, since
apparently the VFS can be instructed to plop a new dev_id anywhere in
the hierarchy, I I still don't see why explicit subvolumes are
required. Seems more natural to be able to put a quota on a directory.
To be able to set raid policy on a directory. Compression on a
directory. COW semantics on a directory. Etc.

Ahh well, some of you gave really nice detailed answers, and I
appreciate that. Thanks.

On Tue, Jul 23, 2013 at 4:52 PM, Chris Murphy li...@colorremedies.com wrote:

 On Jul 23, 2013, at 1:43 PM, Jerome Haltom was...@cogito.cx wrote:

 Why not just create the new dev_id on the destination snapshot of any
 directory?

 Right now, snapshots of subvolumes do not contain the contents of contained 
 subvolumes. Hmmm, that sounds horrid.

 Subvolume A
 File 1
 File 2
 Subvolume B
 File 3
 File 4

 If I snapshot subvolume A, the resulting snapshot does not contain File 3 and 
 File 4. Subvolume B is a regular folder in the snapshot of Subvolume A.

 So if every directory were a subvolume by default, this limitation would need 
 to be resolved or snapshotting would become useless. I'm sure there's a more 
 coherent explanation why this isn't desired.

 That way the snapshot can share inodes with is source.


 Snapshots already share inode numbers.


 Chris Murphy--
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Q: Why subvolumes?

2013-07-23 Thread Josef Bacik
On Tue, Jul 23, 2013 at 06:39:57PM -0500, Jerome Haltom wrote:
 Yeah. I was merely curious about the architecture limits that drove
 the design this way, to begin with. Mostly because it seems odd. It
 seems like the most obvious and most natural thing from the user's
 perspective to do would just be able to reflink directories. Like
 every decent source control system that exists, for instance. So, I
 figured there must be some very good reason it wasn't done like that.
 
 I'm still not completely sure what that very good reason is. Obviously
 whatever structure that currently exists for subvolumes would need to
 continue existing, to begin a unique inode scope. But, since
 apparently the VFS can be instructed to plop a new dev_id anywhere in
 the hierarchy, I I still don't see why explicit subvolumes are
 required. Seems more natural to be able to put a quota on a directory.
 To be able to set raid policy on a directory. Compression on a
 directory. COW semantics on a directory. Etc.
 
 Ahh well, some of you gave really nice detailed answers, and I
 appreciate that. Thanks.


Subvolumes are described as directories simply to make it easier to understand.
Directories do not change the heirarchy within the file system itself, they are
simply items in the btree like anything else, they are not special at all.
Subvolumes are _represented_ as directories, but really the directories are just
links to subvolumes.  Subvolumes are a completely separate b-tree, it has it's
own locking, it's own inode numbering and everything.  And this isn't inode
numbering for the sake of inode numbering, our inode numbers are picked by
simply being the next largest objectid we can add to our tree.  Since a
subvolume is it's own tree it's inode numbers start over at the begining.

So it's not that we can just fork off a directory and snapshot there, because
it's not a tree, it's just an item.  A subvolume is its own tree, which can be
snapshotted and locked independantly from the other subvolumes.  Thanks,

Josef 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Q: Why subvolumes?

2013-07-23 Thread Chris Murphy

On Jul 23, 2013, at 7:27 PM, Josef Bacik jba...@fusionio.com wrote:
 
 Subvolumes are described as directories simply to make it easier to 
 understand.
 Directories do not change the heirarchy within the file system itself, they 
 are
 simply items in the btree like anything else, they are not special at all.
 Subvolumes are _represented_ as directories, but really the directories are 
 just
 links to subvolumes.  Subvolumes are a completely separate b-tree, it has it's
 own locking, it's own inode numbering and everything.  And this isn't inode
 numbering for the sake of inode numbering, our inode numbers are picked by
 simply being the next largest objectid we can add to our tree.  Since a
 subvolume is it's own tree it's inode numbers start over at the begining.
 
 So it's not that we can just fork off a directory and snapshot there, because
 it's not a tree, it's just an item.  A subvolume is its own tree, which can be
 snapshotted and locked independantly from the other subvolumes.  Thanks,


I like this, it's useful. Could it be integrated into the Wiki?


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html