Re: Q: Why subvolumes?
On Jul 23, 2013, Jerome Haltom was...@cogito.cx wrote: Why not just create the new dev_id on the destination snapshot of any directory? That way the snapshot can share inodes with is source. Agreed. Nothing stops us from implementing snapshotting of any directory whatsoever: all it takes is to take a snapshot of the subvolume enclosing the directory we want to snapshot, removing everything that's not in the requested directory from the snapshot, and making that directory the root of the snapshot. The only tricky bit here AFAICT is to arrange for the non-snapshotted subtree components to be cleaned up in background. If we had some primitive to unlink an entire subtree and clean it up in background we could use that. -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Q: Why subvolumes?
May I ask why the decision to implement snapshotting through subvolumes? I've been very curious about why the design wasn't to simply allow snapshotting of any directory or file. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Q: Why subvolumes?
Now... since the snapshot's FS tree is a direct duplicate of the original FS tree (actually, it's the same tree, but they look like different things to the outside world), they share everything -- including things like inode numbers. This is OK within a subvolume, because we have the semantics that subvolumes have their own distinct inode-number spaces. If we could snapshot arbitrary subsections of the FS, we'd end up having to fix up inode numbers to ensure that they were unique -- which can't really be an atomic operation (unless you want to have the FS locked while the kernel updates the inodes of the billion files you just snapshotted). I don't think so; I just checked some snapshots and the inos are the same. Btrfs just changes the dev_id of subvolumes (somehow the vfs allows this). The other thing to talk about here is that while the FS tree is a tree structure, it's not a direct one-to-one map to the directory tree structure. In fact, it looks more like a list of inodes, in inode order, with some extra info for easily tracking through the list. The B-tree structure of the FS tree is just a fast indexing method. So snapshotting a directory entry within the FS tree would require (somehow) making an atomic copy, or CoW copy, of only the parts of the FS tree that fall under the directory in question -- so you'd end up trying to take a sequence of records in the FS tree, of arbitrary size (proportional roughly to the number of entries in the directory) and copying them to somewhere else in the same tree in such a way that you can automatically dereference the copies when you modify them. So, ultimately, it boils down to being able to do CoW operations at the byte level, which is going to introduce huge quantities of extra metadata, and it all starts looking really awkward to implement (plus having to deal with the long time taken to copy the directory entries for the thing you're snapshotting). Btrfs already does CoW of arbitrarily-large files (extent lists); doing the same for directories doesn't seem impossible. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Q: Why subvolumes?
On Tue, Jul 23, 2013 at 07:47:41PM +0200, Gabriel de Perthuis wrote: Now... since the snapshot's FS tree is a direct duplicate of the original FS tree (actually, it's the same tree, but they look like different things to the outside world), they share everything -- including things like inode numbers. This is OK within a subvolume, because we have the semantics that subvolumes have their own distinct inode-number spaces. If we could snapshot arbitrary subsections of the FS, we'd end up having to fix up inode numbers to ensure that they were unique -- which can't really be an atomic operation (unless you want to have the FS locked while the kernel updates the inodes of the billion files you just snapshotted). I don't think so; I just checked some snapshots and the inos are the same. Btrfs just changes the dev_id of subvolumes (somehow the vfs allows this). That's what I said. Our current implementation allows different subvolumes to have the same inode numbers, which is what makes it work. If you threw out the concept of subvolumes, or allowed snapshots within subvolumes, then you'd be duplicating inodes within a subvolume, which is one reason it doesn't work. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Unix: For controlling fungal diseases in crops. --- signature.asc Description: Digital signature
Re: Q: Why subvolumes?
Le mar. 23 juil. 2013 21:30:13 CEST, Hugo Mills a écrit : On Tue, Jul 23, 2013 at 07:47:41PM +0200, Gabriel de Perthuis wrote: Now... since the snapshot's FS tree is a direct duplicate of the original FS tree (actually, it's the same tree, but they look like different things to the outside world), they share everything -- including things like inode numbers. This is OK within a subvolume, because we have the semantics that subvolumes have their own distinct inode-number spaces. If we could snapshot arbitrary subsections of the FS, we'd end up having to fix up inode numbers to ensure that they were unique -- which can't really be an atomic operation (unless you want to have the FS locked while the kernel updates the inodes of the billion files you just snapshotted). I don't think so; I just checked some snapshots and the inos are the same. Btrfs just changes the dev_id of subvolumes (somehow the vfs allows this). That's what I said. Our current implementation allows different subvolumes to have the same inode numbers, which is what makes it work. If you threw out the concept of subvolumes, or allowed snapshots within subvolumes, then you'd be duplicating inodes within a subvolume, which is one reason it doesn't work. Sorry for misreading you. Directory snapshots can work by giving a new device number to the snapshot. There is no need to update inode numbers in that case. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Q: Why subvolumes?
Why not just create the new dev_id on the destination snapshot of any directory? That way the snapshot can share inodes with is source. On Tue, Jul 23, 2013 at 2:30 PM, Hugo Mills h...@carfax.org.uk wrote: On Tue, Jul 23, 2013 at 07:47:41PM +0200, Gabriel de Perthuis wrote: Now... since the snapshot's FS tree is a direct duplicate of the original FS tree (actually, it's the same tree, but they look like different things to the outside world), they share everything -- including things like inode numbers. This is OK within a subvolume, because we have the semantics that subvolumes have their own distinct inode-number spaces. If we could snapshot arbitrary subsections of the FS, we'd end up having to fix up inode numbers to ensure that they were unique -- which can't really be an atomic operation (unless you want to have the FS locked while the kernel updates the inodes of the billion files you just snapshotted). I don't think so; I just checked some snapshots and the inos are the same. Btrfs just changes the dev_id of subvolumes (somehow the vfs allows this). That's what I said. Our current implementation allows different subvolumes to have the same inode numbers, which is what makes it work. If you threw out the concept of subvolumes, or allowed snapshots within subvolumes, then you'd be duplicating inodes within a subvolume, which is one reason it doesn't work. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Unix: For controlling fungal diseases in crops. --- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Q: Why subvolumes?
On Jul 23, 2013, at 1:43 PM, Jerome Haltom was...@cogito.cx wrote: Why not just create the new dev_id on the destination snapshot of any directory? Right now, snapshots of subvolumes do not contain the contents of contained subvolumes. Hmmm, that sounds horrid. Subvolume A File 1 File 2 Subvolume B File 3 File 4 If I snapshot subvolume A, the resulting snapshot does not contain File 3 and File 4. Subvolume B is a regular folder in the snapshot of Subvolume A. So if every directory were a subvolume by default, this limitation would need to be resolved or snapshotting would become useless. I'm sure there's a more coherent explanation why this isn't desired. That way the snapshot can share inodes with is source. Snapshots already share inode numbers. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Q: Why subvolumes?
Yeah. I was merely curious about the architecture limits that drove the design this way, to begin with. Mostly because it seems odd. It seems like the most obvious and most natural thing from the user's perspective to do would just be able to reflink directories. Like every decent source control system that exists, for instance. So, I figured there must be some very good reason it wasn't done like that. I'm still not completely sure what that very good reason is. Obviously whatever structure that currently exists for subvolumes would need to continue existing, to begin a unique inode scope. But, since apparently the VFS can be instructed to plop a new dev_id anywhere in the hierarchy, I I still don't see why explicit subvolumes are required. Seems more natural to be able to put a quota on a directory. To be able to set raid policy on a directory. Compression on a directory. COW semantics on a directory. Etc. Ahh well, some of you gave really nice detailed answers, and I appreciate that. Thanks. On Tue, Jul 23, 2013 at 4:52 PM, Chris Murphy li...@colorremedies.com wrote: On Jul 23, 2013, at 1:43 PM, Jerome Haltom was...@cogito.cx wrote: Why not just create the new dev_id on the destination snapshot of any directory? Right now, snapshots of subvolumes do not contain the contents of contained subvolumes. Hmmm, that sounds horrid. Subvolume A File 1 File 2 Subvolume B File 3 File 4 If I snapshot subvolume A, the resulting snapshot does not contain File 3 and File 4. Subvolume B is a regular folder in the snapshot of Subvolume A. So if every directory were a subvolume by default, this limitation would need to be resolved or snapshotting would become useless. I'm sure there's a more coherent explanation why this isn't desired. That way the snapshot can share inodes with is source. Snapshots already share inode numbers. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Q: Why subvolumes?
On Tue, Jul 23, 2013 at 06:39:57PM -0500, Jerome Haltom wrote: Yeah. I was merely curious about the architecture limits that drove the design this way, to begin with. Mostly because it seems odd. It seems like the most obvious and most natural thing from the user's perspective to do would just be able to reflink directories. Like every decent source control system that exists, for instance. So, I figured there must be some very good reason it wasn't done like that. I'm still not completely sure what that very good reason is. Obviously whatever structure that currently exists for subvolumes would need to continue existing, to begin a unique inode scope. But, since apparently the VFS can be instructed to plop a new dev_id anywhere in the hierarchy, I I still don't see why explicit subvolumes are required. Seems more natural to be able to put a quota on a directory. To be able to set raid policy on a directory. Compression on a directory. COW semantics on a directory. Etc. Ahh well, some of you gave really nice detailed answers, and I appreciate that. Thanks. Subvolumes are described as directories simply to make it easier to understand. Directories do not change the heirarchy within the file system itself, they are simply items in the btree like anything else, they are not special at all. Subvolumes are _represented_ as directories, but really the directories are just links to subvolumes. Subvolumes are a completely separate b-tree, it has it's own locking, it's own inode numbering and everything. And this isn't inode numbering for the sake of inode numbering, our inode numbers are picked by simply being the next largest objectid we can add to our tree. Since a subvolume is it's own tree it's inode numbers start over at the begining. So it's not that we can just fork off a directory and snapshot there, because it's not a tree, it's just an item. A subvolume is its own tree, which can be snapshotted and locked independantly from the other subvolumes. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Q: Why subvolumes?
On Jul 23, 2013, at 7:27 PM, Josef Bacik jba...@fusionio.com wrote: Subvolumes are described as directories simply to make it easier to understand. Directories do not change the heirarchy within the file system itself, they are simply items in the btree like anything else, they are not special at all. Subvolumes are _represented_ as directories, but really the directories are just links to subvolumes. Subvolumes are a completely separate b-tree, it has it's own locking, it's own inode numbering and everything. And this isn't inode numbering for the sake of inode numbering, our inode numbers are picked by simply being the next largest objectid we can add to our tree. Since a subvolume is it's own tree it's inode numbers start over at the begining. So it's not that we can just fork off a directory and snapshot there, because it's not a tree, it's just an item. A subvolume is its own tree, which can be snapshotted and locked independantly from the other subvolumes. Thanks, I like this, it's useful. Could it be integrated into the Wiki? Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html