Re: Cloning a Btrfs partition
On Wed, Dec 7, 2011 at 10:35 AM, BJ Quinn b...@placs.net wrote: I've got a 6TB btrfs array (two 3TB drives in a RAID 0). It's about 2/3 full and has lots of snapshots. I've written a script that runs through the snapshots and copies the data efficiently (rsync --inplace --no-whole-file) from the main 6TB array to a backup array, creating snapshots on the backup array and then continuing on copying the next snapshot. Problem is, it looks like it will take weeks to finish. I've tried simply using dd to clone the btrfs partition, which technically appears to work, but then it appears that the UUID between the arrays is identical, so I can only mount one or the other. This means I can't continue to simply update the backup array with the new snapshots created on the main array (my script is capable of catching up the backup array with the new snapshots, but if I can't mount both arrays...). Any suggestions? Until an analog of zfs send is added to btrfs (and I believe there are some side projects ongoing to add something similar), your only option is the one you are currently using via rsync. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs subvolume snapshot syntax too smart
On Mon, Apr 4, 2011 at 12:47 PM, Goffredo Baroncelli kreij...@libero.it wrote: On 04/04/2011 09:09 PM, krz...@gmail.com wrote: I understand btrfs intent but same command run twice should not give diffrent results. This really makes snapshot automation hard root@sv12 [/ssd]# btrfs subvolume snapshot /ssd/sub1 /ssd/5 Create a snapshot of '/ssd/sub1' in '/ssd/5' root@sv12 [/ssd]# btrfs subvolume snapshot /ssd/sub1 /ssd/5 Create a snapshot of '/ssd/sub1' in '/ssd/5/sub1' root@sv12 [/ssd]# btrfs subvolume snapshot /ssd/sub1 /ssd/5 Create a snapshot of '/ssd/sub1' in '/ssd/5/sub1' ERROR: cannot snapshot '/ssd/sub1' The same is true for cp: # cp -rf /ssd/sub1 /ssd/5 - copy sub1 as 5 # cp -rf /ssd/sub1 /ssd/5 - copy sub1 in 5 However you are right. It could be fixed easily adding a switch like --script, which force to handle the last part of the destination as the name of the subvolume, raising an error if it already exists. subvolume snapshot is the only command which suffers of this kind of problem ? Isn't this a situation where supporting a trailing / would help? For example, with the / at the end, means put the snapshot into the folder. Thus btrfs subvolume snapshot /ssd/sub1 /ssd/5/ would create a sub1 snapshot inside the 5/ folder. Running it a second time would error out since /ssd/5/sub1/ already exists. And if the 5/ folder doesn't exist, it would error out. And without the / at the end, means name the snapshot. Thus btrfs subvolume snapshot /ssd/sub1 /ssd/5 would create a snapshot named /ssd/5. Running the command again would error out due to the snapshot already existing. And if the 5/ folder doesn't exist, it's created. And it errors out if the 5/ folder already exists. Or, something along those lines. Similar to how other apps work with/without a trailing /. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: efficiency of btrfs cow
On Sun, Mar 6, 2011 at 8:02 AM, Fajar A. Nugraha l...@fajar.net wrote: On Sun, Mar 6, 2011 at 10:46 PM, Brian J. Murrell br...@interlinx.bc.ca wrote: # cp -al /backup/previous-backup/ /backup/current-backup # rsync -aAHX ... --exclude /backup / /backup/current-backup The shortcoming of this of course is that it just takes 1 byte in a (possibly huge) file to require that the whole file be recopied to the backup. If you have snapshots anyway, why not : - create a snapshot before each backup run - use the same directory (e.g. just /backup), no need to cp anything - add --inplace to rsync You may also want to test with/without --no-whole-file as well. That's most useful when the two filesystems are on the same system and should reduce the amount of data copied around, as it forces rsync to only use file deltas. This is very much a win on ZFS, which is also CoW, so it should be a win on Btrfs. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs wishlist
On Tue, Mar 1, 2011 at 10:39 AM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Roy Sigurd Karlsbakk's message of 2011-03-01 13:35:42 -0500: - Pool-like management with multiple RAIDs/mirrors (VDEVs) We have a pool of drives nowI'm not sure exactly what the vdevs are. This functionality is in btrfs already, but it's using different terminology and configuration methods. In ZFS, the lowest level in the storage stack is the physical block device. You group these block devices together into a virtual device (aka vdev). The possible vdevs are: - single disk vdev, with no redundancy - mirror vdev, with any number of devices (n-way mirroring) - raidz1 vdev, single-parity redundancy - raidz2 vdev, dual-parity redundancy - raidz3 vdev, triple-party redundancy - log vdev, separate device for journaling, or as a write cache - cache vdev, separate device that acts as a read cache A ZFS pool is made up of a collection of the vdevs. For example, a simple, non-redundant pool setup for a laptop would be: zpool create laptoppool da0 To create a pool with a dual-parity vdev using 8 disks: zpool create mypool raidz2 da0 da1 da2 da3 da4 da5 da6 da7 To later add to the existing pool: zpool add mypool raidz2 da8 da9 da10 da11 da12 da13 da14 da15 Later, you create your ZFS filesystems ontop of the pool. With btrfs, you setup the redundancy and the filesystem all in one shot, thus combining the vdev with the pool (aka filesystem). ZFS has better separation of the different layers (device, pool, filesystem), and better tools for working with them (zpool / zfs) but similar functionality is (or at least appears to be) in btrfs already. Using device mapper / md underneath btrfs also gives you a similar setup to ZFS. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synching a Backup Server
On Sat, Jan 22, 2011 at 5:45 AM, Hugo Mills hugo-l...@carfax.org.uk wrote: On Fri, Jan 21, 2011 at 11:28:19AM -0800, Freddie Cash wrote: So, is Btrfs pooled storage or not? Do you throw 24 disks into a single Btrfs filesystem, and then split that up into separate sub-volumes as needed? Yes, except that the subvolumes aren't quite as separate as you seem to think that they are. There's no preallocation of storage to a subvolume (in the way that LVM works), so you're only limited by the amount of free space in the whole pool. Also, data stored in the pool is actually free for use by any subvolume, and can be shared (see the deeper explanation below). Ah, perfect, that I understand. :) It's the same with ZFS: you add storage to a pool, filesystems in the pool are free to use as much as there is available, you don't have to pre-allocate or partition or anything that. ZFS supports quotas and reservations, though, so you can (if you want/need) allocate bytes to specific filesystems. From the looks of things, you don't have to partition disks or worry about sizes before formatting (if the space is available, Btrfs will use it). But it also looks like you still have to manage disks. Or, maybe it's just that the initial creation is done via mkfs (as in, formatting a partition with a filesystem) that's tripping me up after using ZFS for so long (zpool creates the storage pool, manages the disks, sets up redundancy levels, etc; zfs creates filesystems and volumes, and sets properties; no newfs/mkfs involved). So potentially zpool - mkfs.btrfs, and zfs - btrfs. However, I don't know enough about ZFS internals to know whether this is a reasonable analogy to make or not. That's what I figured. It's not a perfect analogue, but it's close enough. Clears things up a bit. The big different is that ZFS separates storage management (the pool) from filesystem management; while btrfs creates a pool underneath one filesystem, and allows you to split it up via sub-volumes. I think I'm figuring this out. :) Note that the actual file data, and the management of its location on the disk (and its replication), is completely shared across subvolumes. The same extent may be used multiple times by different files, and those files may be in any subvolumes on the filesystem. In theory, the same extent could even appear several times in the same file. This sharing is how snapshots and COW copies are implemented. It's also the basis for Josef's dedup implementation. That's similar to how ZFS works, only they use blocks instead of extents, but it works in a similar manner. I think I've got this mostly figured out. Now, to just wait for multiple parity redundancy (RAID5/6/+) support to hit the tree, so I can start playing around with it. :) Thanks for taking the time to explain some things. Sorry if I came across as being harsh or whatnot. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synching a Backup Server
On Sun, Jan 9, 2011 at 10:30 AM, Hugo Mills hugo-l...@carfax.org.uk wrote: On Sun, Jan 09, 2011 at 09:59:46AM -0800, Freddie Cash wrote: Let see if I can match up the terminology and layers a bit: LVM Physical Volume == Btrfs disk == ZFS disk / vdevs LVM Volume Group == Btrfs filesystem == ZFS storage pool LVM Logical Volume == Btrfs subvolume == ZFS volume 'normal' filesysm == Btrfs subvolume (when mounted) == ZFS filesystem Does that look about right? Kind of. The thing is that the way that btrfs works is massively different to the way that LVM works (and probably massively different to the way that ZFS works, but I don't know much about ZFS, so I can't comment there). I think that trying to think of btrfs in LVM terms is going to lead you to a large number of incorrect conclusions. It's just not a good model to use. My biggest issue trying to understand Btrfs is figuring out the layers involved. With ZFS, it's extremely easy: disks -- vdev -- pool -- filesystems With LVM, it's fairly easy: disks - volume group -- volumes -- filesystems But, Btrfs doesn't make sense to me: disks -- filesystem -- sub-volumes??? So, is Btrfs pooled storage or not? Do you throw 24 disks into a single Btrfs filesystem, and then split that up into separate sub-volumes as needed? From the looks of things, you don't have to partition disks or worry about sizes before formatting (if the space is available, Btrfs will use it). But it also looks like you still have to manage disks. Or, maybe it's just that the initial creation is done via mkfs (as in, formatting a partition with a filesystem) that's tripping me up after using ZFS for so long (zpool creates the storage pool, manages the disks, sets up redundancy levels, etc; zfs creates filesystems and volumes, and sets properties; no newfs/mkfs involved). It looks like ZFS, Btrfs, and LVM should work in similar manners, but the overloaded terminology (pool, volume, sub-volume, filesystem are different in all three) and new terminology that's only in Btrfs is confusing. Just curious, why all the new terminology in btrfs for things that already existed? And why are old terms overloaded with new meanings? I don't think I've seen a write-up about that anywhere (or I don't remember it if I have). The main awkward piece of btrfs terminology is the use of RAID to describe btrfs's replication strategies. It's not RAID, and thinking of it in RAID terms is causing lots of confusion. Most of the other things in btrfs are, I think, named relatively sanely. No, the main awkward piece of btrfs terminology is overloading filesystem to mean collection of disks and creating sub-volume to mean filesystem. At least, that's how it looks from way over here. :) Perhaps it's time to start looking at separating the btrfs pool creation tools out of mkfs (or renaming mkfs.btrfs), since you're really building a a storage pool, and not a filesystem. It would prevent a lot of confusion with new users. It's great that there's a separate btrfs tool for manipulating btrfs setups, but mkfs.btrfs is just wrong for creating the btrfs setup. I think this is the wrong thing to do. I hope my explanation above helps. As I understand it, the mkfs.btrfs is used to create the initial filesystem across X disks with Y redundancy. For everthing else afterward, the btrfs tool is used to add disks, create snapshots, delete snapshots, change redundancy settings, create sub-volumes, etc. Why not just add a create option to btrfs and retire mkfs.btrfs completely. Or rework mkfs.btrfs to create sub-volumes of an existing btrfs setup? What would be great is if there was an image that showed the layers in Btrfs and how they interacted with the userspace tools. Having a set of graphics that compared the layers in Btrfs with the layers in the normal Linux disk/filesystem partitioning scheme, and the LVM layering, would be best. There's lots of info in the wiki, but no images, ASCII-art, graphics, etc. Trying to picture this mentally is not working. :) -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backup Command
On Fri, Jan 21, 2011 at 11:07 AM, cac...@quantum-sci.com wrote: Well thanks to some help from you guys I seem to have my backup server almost fully running and functional with rsync. Amazing functions, this snapshotting and rsync. I still don't know why I cannot remove snapshots though. (Debian Testing with 2.6.32-28) And I don't know how to reach out from the backup server to the HTPC and stop MythTV there, so I can export the Mysql database safely, from a cron job on the backup server. Suggestions? Simplified, but workable: #!/bin/sh ssh someu...@mythtv.pc /path/to/some/script stop /path/to/your/rsync/script ssh someu...@mythtv.pc /path/to/some/script start The above script would be your backup wrapper script, that gets called by cron. On the HTPC, /path/to/some/script would be a script that takes 2 arguments (stop|start). The stop argument would stop mythtv using the init script for it. Then would do whatever you need to do to the database (stop it, dump it, whatever). The start argument would do the reverse, starting the database and MythTV. And, /path/to/your/rsync/script would call your actual backups script that runs rsync. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synching a Backup Server
On Sun, Jan 9, 2011 at 7:32 AM, Alan Chandler a...@chandlerfamily.org.uk wrote: I think I start to get it now. Its the fact that subvolumes can be snapshotted etc without mounting them that is the difference. I guess I am too used to thinking like LVM and I was thinking subvolumes where like an LV. They are, but not quite the same. Let see if I can match up the terminology and layers a bit: LVM Physical Volume == Btrfs disk == ZFS disk / vdevs LVM Volume Group == Btrfs filesystem == ZFS storage pool LVM Logical Volume == Btrfs subvolume == ZFS volume 'normal' filesysm == Btrfs subvolume (when mounted) == ZFS filesystem Does that look about right? LVM: A physical volume is the lowest layer in LVM and they are combined into a volume group which is then split up into logical volumes, and formatted with a filesystem. Btrfs: A bunch of disks are formatted into a btrfs filesystem which is then split up into sub-volumes (sub-volumes are auto-formatted with a btrfs filesystem). ZFS: A bunch of disks are combined into virtual devices, then combined into a ZFS storage pool, which can be split up into either volumes formatted with any filesystem, or ZFS filesystems. Just curious, why all the new terminology in btrfs for things that already existed? And why are old terms overloaded with new meanings? I don't think I've seen a write-up about that anywhere (or I don't remember it if I have). Perhaps it's time to start looking at separating the btrfs pool creation tools out of mkfs (or renaming mkfs.btrfs), since you're really building a a storage pool, and not a filesystem. It would prevent a lot of confusion with new users. It's great that there's a separate btrfs tool for manipulating btrfs setups, but mkfs.btrfs is just wrong for creating the btrfs setup. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Various Questions
On Sat, Jan 8, 2011 at 5:25 AM, Carl Cook cac...@quantum-sci.com wrote: In addition to the questions below, if anyone has a chance could you advise on why my destination drive has more data than the source after this command: # rsync --hard-links --delete --inplace --archive --numeric-ids /media/disk/* /home sending incremental file list What happens if you delete /home, then run the command again, but without the *? You generally don't use wildcards for the source or destination when using rsync. You just tell it which directory to start in. If you do an ls /home and ls /media/disk are they different? -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Various Questions
On Fri, Jan 7, 2011 at 9:15 AM, Carl Cook cac...@quantum-sci.com wrote: How do you know what options to rsync are on by default? I can't find this anywhere. For example, it seems to me that --perms -ogE --hard-links and --delete-excluded should be on by default, for a true sync? Who cares which ones are on by default? List the ones you want to use on the command-line, everytime. That way, if the defaults change, your setup won't. If using the --numeric-ids switch for rsync, do you just have to manually make sure the IDs and usernames are the same on source and destination machines? You use the --numeric-ids switch so that it *doesn't* matter if the IDs/usernames are the same. It just sends the ID number on the wire. Sure, if you do an ls on the backup box, the username will appear to be messed up. But if you compare the user ID assigned to the file, and the user ID to the backed up etc/passwd file, they are correct. Then, if you ever need to restore the HTPC from backups, the etc/passwd file is transferred over, the user IDs are transferred over, and when you do an ls on the HTPC, everything matches up correctly. For files that fail to transfer, wouldn't it be wise to use --partial-dir=DIR to at least recover part of lost files? Or, just run rsync again, if the connection is dropped. The rsync man page says that rsync uses ssh by default, but is that the case? I think -e may be related to engaging ssh, but don't understand the explanation. Does it matter what the default is, if you specify exactly how you want it to work on the command-line? So for my system where there is a backup server, I guess I run the rsync daemon on the backup server which presents a port, then when the other systems decide it's time for a backup (cron) they: - stop mysql, dump the database somewhere, start mysql; - connect to the backup server's rsync port and dump their data to (hopefully) some specific place there. Right? That's one way (push backups). It works ok for small numbers of systems being backed up. But get above a handful of machines, and it gets very hard to time everything so that you don't hammer the disks on the backup server. Pull backups (backups server does everything) works better, in my experience. Then you just script things up once, run 1 script, worry about 1 schedule, and everything is stored on the backups server. No need to run rsync daemons everywhere, just run the rsync client, using -e ssh, and let it do everything. If you need it to run a script on the remote machine first, that's easy enough to do: - ssh to remote system, run script to stop DBs, dump DBs, snapshot FS, whatever - then run rsync - ssh to remote system run script to start DBs, delete snapshot, whatever You're starting to over-think things. Keep it simple, don't worry about defaults, specify everything you want to do, and do it all from the backups box. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synching a Backup Server
On Thu, Jan 6, 2011 at 9:35 AM, Carl Cook cac...@quantum-sci.com wrote: I am setting up a backup server for the garage, to back up my HTPC in case of theft or fire. The HTPC has a 4TB RAID10 array (mdadm, JFS), and will be connected to the backup server using GB ethernet. The backup server will have a 4TB BTRFS RAID0 array. Debian Testing running on both. I want to keep a duplicate copy of the HTPC data, on the backup server, and I think a regular full file copy is not optimal and may take days to do. So I'm looking for a way to sync the arrays at some interval. Ideally the sync would scan the HTPC with a CRC check to look for differences, copy over the differences, then email me on success. Is there a BTRFS tool that would do this? No, but there's a great tool called rsync that does exactly what you want. :) This is (basically) the same setup we use at work to backup all our remote Linux/FreeBSD systems to a central backups server (although our server runs FreeBSD+ZFS). Just run rsync on the backup server, tell it to connect via ssh to the remote server, and rsync / (root filesystem) into /backups/htpc/ (or whatever directory you want). Use an exclude file to exclude the directories you don't want backed up (like /proc, /sys, /dev). If you are comfortable compiling software, then you should look into adding the HPN patches to OpenSSH, and enabling the None cipher. That will give you 30-40% network throughput increase. After the rsync completes, snapshot the filesystem on the backup server, using the current date for the name. Then repeat the rsync process the next day, into the exact same directory. Only files that have changed will be transferred. Then snapshot the filesystem using the current date. And repeat ad nauseum. :) Some useful rsync options to read up on: --hard-links --numeric-ids --delete-during --delete-excluded --archive The first time you run the rsync command, it will take awhile, as it transfers every file on the HTPC to the backups server. However, you can stop and restart this process as many times as you like. rsync will just pick up where it left off. Also with this system, I'm concerned that if there is corruption on the HTPC, it could be propagated to the backup server. Is there some way to address this? Longer intervals to sync, so I have a chance to discover? Using snapshots on the backup server allows you to go back in time to recover files that may have been accidentally deleted, or to recover files that have been corrupted. Be sure to use rsync 3.x, as that will start transferring data a *lot* sooner, shortening the overall time needed for the sync. rsync 2.x scans the entire remote filesystem first, builds a list of files, then compares that list to the files on the backup server. rsync 3.x scans a couple directories, then starts transferring data while scanning ahead. Once you have a working command-line for rsync, adding it to a script and then using cron to schedule it completes the setup. Works beautifully. :) Saved our bacon several times over the past 2 years. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synching a Backup Server
On Thu, Jan 6, 2011 at 11:33 AM, Marcin Kuk marcin@gmail.com wrote: Rsync is good, but not for all cases. Be aware of databases files - you should do snapshot filesystem before rsyncing. We script a dump of all databases before the rsync runs, so we get both text and binary backups. If restoring the binary files doesn't work, then we just suck in the text dumps. If the remote system supports snapshots, doing a snapshot before the rsync runs is a good idea, though. It'll be nice when more filesystems support in-line snapshots. The LVM method is pure crap. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synching a Backup Server
On Thu, Jan 6, 2011 at 12:07 PM, C Anthony Risinger anth...@extof.me wrote: On Thu, Jan 6, 2011 at 1:47 PM, Freddie Cash fjwc...@gmail.com wrote: On Thu, Jan 6, 2011 at 11:33 AM, Marcin Kuk marcin@gmail.com wrote: Rsync is good, but not for all cases. Be aware of databases files - you should do snapshot filesystem before rsyncing. We script a dump of all databases before the rsync runs, so we get both text and binary backups. If restoring the binary files doesn't work, then we just suck in the text dumps. If the remote system supports snapshots, doing a snapshot before the rsync runs is a good idea, though. It'll be nice when more filesystems support in-line snapshots. The LVM method is pure crap. do you also use the --in-place option for rsync? i would think this is critical to getting the most out of btrfs folding backups, ie. the most reuse between snapshots? im able to set this exact method up for my home network, thats why i ask... i have a central server that runs everything, and i want to sync a couple laptops and netbooks nightly, and a few specific directories whenever they change. btrfs on both ends. Yes, we do use --inplace, forgot about that one. Full rsync command used: ${rsync} ${rsync_options} \ --exclude-from=${defaultsdir}/${rsync_exclude} ${rsync_exclude_server} \ --rsync-path=${rsync_path} --rsh=${ssh} -p ${rsync_port} -i ${defaultsdir}/${rsync_key} \ --log-file=${logdir}/${rsync_server}.log \ ${rsync_us...@${rsync_server}:${basedir}/ ${backupdir}/${sitedir}/${serverdir}/${basedir}/ Where rsync_options is: --archive --delete-during --delete-excluded --hard-links --inplace --numeric-ids --stats better yet, any chance you'd share some scripts? :-) A description of what we use, including all scripts, is here: http://forums.freebsd.org/showthread.php?t=11971 as for the DB stuff, you definitely need to snapshot _before_ rsync. roughly: ) read lock and flush tables ) snapshot ) unlock tables ) mount snapshot ) rsync from snapshot Unfortunately, we don't use btrfs or LVM on remote servers, so there's no snapshotting available during the backup run. In a perfect world, btrfs would be production-ready, ZFS would be available on Linux, and we'd no longer need the abomination called LVM. :) Until then, DB text dumps are our fall-back. :) -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synching a Backup Server
On Thu, Jan 6, 2011 at 1:06 PM, Gordan Bobic gor...@bobich.net wrote: Unfortunately, we don't use btrfs or LVM on remote servers, so there's no snapshotting available during the backup run. In a perfect world, btrfs would be production-ready, ZFS would be available on Linux, and we'd no longer need the abomination called LVM. :) As a matter of fact, ZFS _IS_ available on Linux: http://zfs.kqinfotech.com/ Available, usable, and production-ready are not synonymous. :) ZFS on Linux is not even in the experimental/testing stage right now. ZFS-fuse is good for proof-of-concept stuff, but chokes on heavy usage, especially with dedupe enabled. We tried it for a couple weeks to see what was available in ZFS versions above 14, but couldn't keep it running for more than a day or two at a time. Supposedly, things are better now, but I wouldn't trust 15 TB of backups to it. :) The Lawrence-Liverpool ZFS module for Linux doesn't support ZFS filesystems yet, only ZFS volumes. It should be usable as an LVM replacement, though, or as an iSCSI target box. Haven't tried it yet. The Middle-East (forget which country it's from) ZFS module for Linux is in the private beta stage, but only available for a few distros and kernel versions, and is significantly slower than ZFS on FreeBSD. Hopefully, it will enter public beta this year, it sounds promising. Don't think I'd trust 15 TB of backups to it for at least another year, though. If btrfs gets dedupe, nicer disk management (it's hard to use non-pooled storage now), a working fsck (or similar), and integration into Debian, then we may look at that as well. :) -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synching a Backup Server
On Thu, Jan 6, 2011 at 1:42 PM, Carl Cook cac...@quantum-sci.com wrote: On Thu 06 January 2011 11:16:49 Freddie Cash wrote: Also with this system, I'm concerned that if there is corruption on the HTPC, it could be propagated to the backup server. Is there some way to address this? Longer intervals to sync, so I have a chance to discover? Using snapshots on the backup server allows you to go back in time to recover files that may have been accidentally deleted, or to recover files that have been corrupted. How? I can see that rsync will not transfer the files that have not changed, but I assume it transfers the changed ones. How can you go back in time? Is there like a snapshot file that records the state of all files there? I don't know the specifics of how it works in btrfs, but it should be similar to how ZFS does it. The gist of it is: Each snapshot gives you a point-in-time view of the entire filesystem. Each snapshot can be mounted (ZFS is read-only; btrfs is read-only or read-write). So, you mount the snapshot for 2010-12-15 onto /mnt, then cd to the directory you want (/mnt/htpc/home/fcash/videos/) and copy the file out that you want to restore (cp coolvid.avi ~/). With ZFS, things are nice and simple: - each filesystem has a .zfs/snapshot directory - in there are sub-directories, each named after the snapshot name - cd into the snapshot name, the OS auto-mounts the snapshot, and off you go Btrfs should be similar? Don't know the specifics. How it works internally, is some of the magic and the beauty of Copy-on-Write filesystems. :) -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synching a Backup Server
On Thu, Jan 6, 2011 at 1:44 PM, Carl Cook cac...@quantum-sci.com wrote: On Thu 06 January 2011 12:07:17 C Anthony Risinger wrote: as for the DB stuff, you definitely need to snapshot _before_ rsync. roughly: ) read lock and flush tables ) snapshot ) unlock tables ) mount snapshot ) rsync from snapshot ie. the same as whats needed for LVM: http://blog.dbadojo.com/2007/09/mysql-backups-using-lvm-snapshots.html to get the DB file on disk consistent prior to archiving. I'm a little alarmed by this. Running a mysql server for MythTV database. Do these operations need to somehow be done before rsync? Or Else? I don't understand what you're saying. Simplest solution is to write a script to create a mysqldump of all databases into a directory, add that to cron so that it runs at the same time everyday, 10-15 minutes before the rsync run is done. That way, rsync to the backup server picks up both the text dump of the database(s), along with the binary files under /var/lib/mysql/* (the actual running database). When you need to restore the HTPC due to failed harddrive or what not, you just rsync everything back to the new harddrive and try to run MythTV. If things work, great, done. If something is wonky, then delete all the MySQL tables/databases, and use the dump file to recreate things. Something like this: #!/bin/bash # Backup mysql databases. # # Take a list of databases, and dump each one to a separate file. debug=0 while getopts hv OPTION; do case ${OPTION} in h) echo Usage: $0 [-h] [-v] echo echo -h show this help blurb echo -v be verbose about what's happening exit 0 ;; v) debug=1 ;; esac done for I in $( mysql -u root --password=blahblahblah -Bse show databases ); do OUTFILE=/var/backups/$I.sql if [ $debug = 1 ]; then echo -n Doing backup for $I: fi /usr/bin/mysqldump -u root --password=blahblahblah --opt $I $OUTFILE /bin/chmod 600 $OUTFILE if [ $debug = 1 ]; then echo done. fi done exit 0 That will create a text dump of everything in each database, creating a separate file per database. It can be used via the mysql command to recreate the database at a later date. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Offline Deduplication for Btrfs
On Wed, Jan 5, 2011 at 11:46 AM, Josef Bacik jo...@redhat.com wrote: Dedup is only usefull if you _know_ you are going to have duplicate information, so the two major usecases that come to mind are 1) Mail server. You have small files, probably less than 4k (blocksize) that you are storing hundreds to thousands of. Using dedup would be good for this case, and you'd have to have a small dedup blocksize for it to be usefull. 2) Virtualized guests. If you have 5 different RHEL5 virt guests, chances are you are going to share data between them, but unlike with the mail server example, you are likely to find much larger chunks that are the same, so you'd want a larger dedup blocksize, say 64k. You want this because if you did just 4k you'd end up with a ridiculous amount of framentation and performance would go down the toilet, so you need a larger dedup blocksize to make for better performance. You missed out on the most obvious, and useful, use case for dedupe: central backups server. Our current backup server does an rsync backup of 127 servers every night into a single ZFS pool. 90+ of those servers are identical Debian installs (school servers), 20-odd of those are identical FreeBSD installs (firewalls/routers), and the rest are mail/web/db servers (Debian, Ubuntu, RedHat, Windows). Just as a test, we copied a week of backups to a Linux box running ZFS-fuse with dedupe enabled, and had a combined dedupe/compress ration in the low double-digits (11 or 12x, something like that). Now we're just waiting for ZFSv22+ to hit FreeBSD to enable dedupe on the backups server. For backups, and central storage for VMs, online dedupe is a massive win. Offline, maybe. Either way, dedupe is worthwhile. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Offline Deduplication for Btrfs
On Wed, Jan 5, 2011 at 12:15 PM, Josef Bacik jo...@redhat.com wrote: Yeah for things where you are talking about sending it over the network or something like that every little bit helps. I think deduplication is far more interesting and usefull at an application level than at a filesystem level. For example with a mail server, there is a good chance that the files will be smaller than a blocksize and not be able to be deduped, but if the application that was storing them recognized that it had the same messages and just linked everything in its own stuff then that would be cool. Thanks, Cyrus IMAP and Zimbra (probably a lot of others) already do that, hard-linking identical message bodies. The e-mail server use-case is for dedupe is pretty much covered already. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Offline Deduplication for Btrfs
On Wed, Jan 5, 2011 at 5:03 PM, Gordan Bobic gor...@bobich.net wrote: On 01/06/2011 12:22 AM, Spelic wrote: Definitely agree that it should be a per-directory option, rather than per mount. JOOC, would the dedupe table be done per directory, per mount, per sub-volume, or per volume? The larger the pool of data to check against, the better your dedupe ratios will be. I'm not up-to-date on all the terminology that btrfs uses, and how it compares to ZFS (disks - vdevs - pool - filesystem/volume), so the terms above may be incorrect. :) In the ZFS world, dedupe is done pool-wide in that any block in the pool is a candidate for dedupe, but the dedupe property can be enabled/disabled on a per-filesystem basis. Thus, only blocks in filesystems with the dedupe property enabled will be deduped. But blocks from any filesystem can be compared against. This is the point I was making - you end up paying double the cost in disk I/O and the same cost in CPU terms if you do it offline. And I am not convniced the overhead of calculating checksums is that great. There are already similar overheads in checksums being calculated to enable smart data recovery in case of silent disk corruption. Now that I mentioned, that, it's an interesting point. Could these be unified? If we crank up the checksums on files a bit, to something suitably useful for deduping, it could make the deduping feature almost free. This is what ZFS does. Every block in the pool has a checksum attached to it. Originally, the default algorithm was fletcher2, with fletcher4 and sha256 as alternates. When dedupe was enabled, the default was changed to fletcher4. Dedupe also came with the option to enable/disable a byte-for-byte verify when the hashes match. By switching the checksum algorithm for the pool to sha256 ahead of time, you can enable dedupe, and get the dedupe checksumming for free. :) Also, the OS is small even if identical on multiple virtual images, how much is going to occupy anyway? Less than 5GB per disk image usually. And that's the only thing that would be deduped because data likely to be different on each instance. How many VMs running you have? 20? That's at most 100GB saved one-time at the cost of a lot of fragmentation. That's also 100GB fewer disk blocks in contention for page cache. If you're hitting the disks, you're already going to slow down by several orders of magnitude. Better to make the caching more effective. If you setup your VMs as diskless images, using NFS off a storage server running whatever FS using dedupe, you can get a lot more out of it than using disk image files (where you have all the block sizes and alignment to worry about). And the you can use all the fancy snapshotting, cloning, etc features of whatever FS as well. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Appending data to the middle of a file using btrfs-specific features
On Mon, Dec 6, 2010 at 11:14 AM, Nirbheek Chauhan nirbheek.chau...@gmail.com wrote: As an aside, my primary motivation for this was that doing an incremental backup of things like git bare repositories and databases using btrfs subvolume snapshots is expensive w.r.t. disk space. Even though rsync calculates a binary delta before transferring data, it has to write everything out (except if just appending). So in that case, each incremental backup is hardly so. Since btrfs is Copy-on-Write, have you experimented with --inplace on the rsync command-line? That way, rsync writes the changes over-top of the existing file, thus allowing btrfs to only write out the blocks that have changed, via CoW? We do this with our ZFS rsync backups, and found disk usage to go way down over the default write out new data to new file, rename overtop method that rsync uses. There's also the --no-whole-file option which causes rsync to only send delta changes for existing files, another useful feature with CoW filesystems. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Appending data to the middle of a file using btrfs-specific features
On Mon, Dec 6, 2010 at 12:30 PM, Nirbheek Chauhan nirbheek.chau...@gmail.com wrote: On Tue, Dec 7, 2010 at 1:05 AM, Freddie Cash fjwc...@gmail.com wrote: On Mon, Dec 6, 2010 at 11:14 AM, Nirbheek Chauhan nirbheek.chau...@gmail.com wrote: As an aside, my primary motivation for this was that doing an incremental backup of things like git bare repositories and databases using btrfs subvolume snapshots is expensive w.r.t. disk space. Even though rsync calculates a binary delta before transferring data, it has to write everything out (except if just appending). So in that case, each incremental backup is hardly so. Since btrfs is Copy-on-Write, have you experimented with --inplace on the rsync command-line? That way, rsync writes the changes over-top of the existing file, thus allowing btrfs to only write out the blocks that have changed, via CoW? We do this with our ZFS rsync backups, and found disk usage to go way down over the default write out new data to new file, rename overtop method that rsync uses. There's also the --no-whole-file option which causes rsync to only send delta changes for existing files, another useful feature with CoW filesystems. I had tried the --inplace option, but it didn't seem to do anything for me, so I didn't explore that further. However, after following your suggestion and retrying with --no-whole-file, I see that the behaviour is quite different! It seems that --whole-file is enabled by default for local file transfers, and so --inplace had no effect. Yes, correct, --whole-file is used for local transfers since it's assumed you have all the disk I/O in the world, so why try to limit the amount of data transferred. :) But the behaviour of --inplace is not entirely to write out *only* the blocks that have changed. From what I could make out, it does the following: (1) Calculate a delta b/w the src and trg files (2) Seek to the first difference in the target file (3) Start writing data That may be true, I've never looked into the actual algorithm(s) that rsync uses. Just played around with CLI options until we found the set that works best in our situation (--inplace --delete-during --no-whole-file --numeric-ids --hard-links --archive, over SSH with HPN patches). I'm glossing over the final step because I didn't look deeper, but I think you can safely assume that after the first difference, all data is rewritten. So this is halfway between rewrite the whole file and write only the changed bits into the file. It doesn't actually use any CoW features from what I can see. There is lots of room for btrfs reflinking magic. :) Note that I tested this behaviour on a btrfs partition with a vanilla rsync-3.0.7 tarball; the copy you use with ZFS might be doing some CoW magic. All the CoW magic is handled by the filesystem, and not the tools on top. If the tool only updates X bytes, which fit into 1 block on the fs, then only that 1 block gets updated via CoW. Personally, I don't think the tools need to be updated to understand CoW or to integrate with the underlying FS. Instead, they should just operate on blocks of X size, and let the FS figure out what to do. Otherwise, you end up with rsync for ZFS, rsync for ZFS, rsync for BtrFS, rsync for FAT32, etc. But, I'm just a lowly sysadmin, what do I know about filesystem internals? ;) -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What to do about subvolumes?
On Wed, Dec 1, 2010 at 11:35 AM, Hugo Mills hugo-l...@carfax.org.uk wrote: On Wed, Dec 01, 2010 at 12:38:30PM -0500, Josef Bacik wrote: If you delete your subvolume A, like use the btrfs tool to delete it, you will only be stuck with what you changed in snapshot B. So if you only changed 5gig worth of information, and you deleted the original subvolume, you would have 5gig charged to your quota. This doesn't work, though, if the owners of the original and new subvolume are different: Case 1: * Porthos creates 10G data. * Athos makes a snapshot of Porthos's data. * A sysadmin (Richelieu) changes the ownership on Athos's snapshot of Porthos's data to Athos. * Porthos deletes his copy of the data. Case 2: * Porthos creates 10G of data. * Athos makes a snapshot of Porthos's data. * Porthos deletes his copy of the data. * A sysadmin (Richelieu) changes the ownership on Athos's snapshot of Porthos's data to Athos. Case 3: * Porthos creates 10G data. * Athos makes a snapshot of Porthos's data. * Aramis makes a snapshot of Porthos's data. * A sysadmin (Richelieu) changes the ownership on Athos's snapshot of Porthos's data to Athos. * Porthos deletes his copy of the data. Case 4: * Porthos creates 10G data. * Athos makes a snapshot of Porthos's data. * Aramis makes a snapshot of Athos's data. * Porthos deletes his copy of the data. [Consider also Richelieu changing ownerships of Athos's and Aramis's data at alternative points in this sequence] In each of these, who gets charged (and how much) for their copy of the data? The idea is you are only charged for what blocks you have on the disk. Thanks, My point was that it's perfectly possible to have blocks on the disk that are effectively owned by two people, and that the person to charge for those blocks is, to me, far from clear. You either end up charging twice for a single set of blocks on the disk, or you end up in a situation where one person's actions can cause another person's quota to fill up. Neither of these is particularly obvious behaviour. As a sysadmin and as a user, quotas shouldn't be about physical blocks of storage used but should be about logical storage used. IOW, if the filesystem is compressed, using 1 GB of physical space to store 10 GB of data, my quota used should be 10 GB. Similar for deduplication. The quota is based on the storage *before* the file is deduped. Not after. Similar for snapshots. If UserA has 10 GB of quota used, I snapshot their filesystem, then my quota used would be 10 GB as well. As data in my snapshot changes, my quota used is updated to reflect that (change 1 GB of data compared to snapshot, use 1 GB of quota). You have to (or at least should) keep two sets of stats for storage usage: - logical amount used (real file size, before compression, before de-dupe, before snapshots, etc) - physical amount used (what's actually written to disk) User-level quotas are based on the logical storage used. Admin-level quotas (if you want to implement them) would be based on physical storage used. Thus, the output of things like df, du, ls would show the logical storage used and file sizes. And you would either have an additional option to those apps (--real or something) to show the actual storage used and file sizes as stored on disk. Trying to make quotas and disk usage utilities to work based on what's physically on disk is just backwards, imo. And prone to a lot of confusion. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What to do about subvolumes?
On Wed, Dec 1, 2010 at 1:28 PM, Hugo Mills hugo-l...@carfax.org.uk wrote: On Wed, Dec 01, 2010 at 12:24:28PM -0800, Freddie Cash wrote: On Wed, Dec 1, 2010 at 11:35 AM, Hugo Mills hugo-l...@carfax.org.uk wrote: The idea is you are only charged for what blocks you have on the disk. Thanks, My point was that it's perfectly possible to have blocks on the disk that are effectively owned by two people, and that the person to charge for those blocks is, to me, far from clear. You either end up charging twice for a single set of blocks on the disk, or you end up in a situation where one person's actions can cause another person's quota to fill up. Neither of these is particularly obvious behaviour. As a sysadmin and as a user, quotas shouldn't be about physical blocks of storage used but should be about logical storage used. IOW, if the filesystem is compressed, using 1 GB of physical space to store 10 GB of data, my quota used should be 10 GB. Similar for deduplication. The quota is based on the storage *before* the file is deduped. Not after. Similar for snapshots. If UserA has 10 GB of quota used, I snapshot their filesystem, then my quota used would be 10 GB as well. As data in my snapshot changes, my quota used is updated to reflect that (change 1 GB of data compared to snapshot, use 1 GB of quota). So if I've got 10G of data, and I snapshot it, I've just used another 10G of quota? Sorry, forgot the per user bit above. If UserA has 10 GB of data, then UserB snapshots it, UserB's quota usage is 10 GB. If UserA has 10 GB of data and snapshots it, then only 10 GB of quota usage is used, as there is 0 difference between the snapshot and the filesystem. As UserA modifies data, their quota usage increases by the amount that is modified (ie 10 GB data, snapshot, modify 1 GB data == 11 GB quota usage). If you combine the two scenarios, you end up with: - UserA has 10 GB of data == 10 GB quota usage - UserB snapshots UserA's filesystem (clone), so UserB has 10 GB quota usage (even though 0 blocks have changed on disk) - UserA snapshots UserA's filesystem == no change to quota usage (no blocks on disk have changed) - UserA modifies 1 GB of data in the filesystem == 1 GB new quota usage (11 GB total) (1 GB of blocks owned by UserA have changed, plus the 10 GB in the snapshot) - UserB still only has 10 GB quota usage, since their snapshot hasn't changed (0 blocks changed) If UserA deletes their filesystem and all their snapshots, freeing up 11 GB of quota usage on their account, UserB's quota will still be 10 GB, and the blocks on the disk aren't actually removed (still referenced by UserB's snapshot). Basically, within a user's account, only the data unique to a snapshot should count toward the quota. Across accounts, the original (root) snapshot would count completely to the new user's quota, and then only data unique to subsequent snapshots would count. I hope that makes it more clear. :) All the different layers and whatnot get confusing. :) -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blog: BTRFS is effectively stable
On Fri, Oct 29, 2010 at 4:38 PM, Chris Samuel ch...@csamuel.org wrote: A friend of mine who builds storage systems designed for HPC use has been keeping an eye on btrfs and has just done some testing of it with 2.6.36 and seems to like what he sees in terms of stability. That's a *very* misleading conclusion to come to based solely on a single file I/O test. It's more realistic to say stable under fio load in ideal conditions. For example: No device-yanking tests were done. No power-cord yanking tests were done. No device cables were yanked, shaken, or plugged/unplugged in rapid succession. No dd the raw device underneath the filesystem while doing file I/O tests were done. No recovery tests were done. IOW, you can't really say it's stable across the board like that. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Francis Galiegue would like your help testing a survey
So far, very nice. Some comments inline below. On Tue, Sep 28, 2010 at 8:07 AM, Francis Galiegue fgalie...@gmail.com wrote: On Tue, Sep 28, 2010 at 16:57, David Pottage da...@electric-spoon.com wrote: On 28/09/10 15:27, Francis Galiegue wrote: Here is a preview of the survey. I have not included *all* feature requests yet, otherwise it wouldn't fit on a screen :), but I think I have chosen the most important ones. Please comment! Click on the following link to test this survey: http://appv3.sgizmo.com/testsurvey/survey?id=376617crc=98980edfce58a795c966488276754ddb A lot of the questions are dependent on if the user is a btrfs user or not. It would be nice ask that as a first question, and then to hide some questions depending on the answer. Yep, but the problem is, as far as I can see, you don't have this option for the type of account (free) I'm using on the site :/ Perhaps add a separate choice (Do not currently use btrfs) to each question after number 6? That way, non-users like me can just breeze through the rest of the survey, but you still get the information from us for the first half of the survey. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs user survey?
On Mon, Sep 27, 2010 at 3:44 PM, Chris Ball c...@laptop.org wrote: Well, all in all, you get the idea, and I'm probably not the guy to craft questions for such a survey. But having input from as large a panel of users as possible would be a nice thing to have. Your questions are fine -- I might add: * Rank the following future features in importance, 4 == most important [ ] working fsck [ ] GUIs for userspace actions e.g. snapshots [ ] data deduplication [ ] hot data relocation You're missing RAID levels above 1, and deduplication. :) (And probably a few others.) Please use something like Google Spreadsheets (which has a forms option) if you're going to run such a survey, rather than having everyone reply on-list -- we shouldn't bother this list with any results other than the final summary. Something like SurveyMonkey (http://www.surveymonkey.com) or SurveyGizmo (http://www.surveygizmo.com) or similar would be better, as it does all the reporting for you, builds the nice survey interface with checkboxes, radio buttons, text fields, etc. And they're still free (as in beer). -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: remote mirroring in the works?
On Mon, Aug 30, 2010 at 2:15 PM, Fred van Zwieten fvzwie...@gmail.com wrote: I just glanced over the DRBD/LVM combi, but I don't see it being functionally equal to SnapMirror. Let me (try to) explain how snapmirror works: On system A there is a volume (vol1). We let this vol1(A) replicate thru SnapMirror to vol1(B). This is done by creating a snap vol1sx(A) and replicate all changed blocks between this snapshot (x) and the previous snapshot (x-1). The first time, there is no x-1 and the whole volume will be replicated, but after this initial full copy, only the changed blocks between the two snapshot's are being replicated to system B. This is also called snap based replication. Why we want this? Easy. To support consistent DB snap's. The proces works by first putting the DB in a consistent mode (depends on DB implementation), create a snapshot, let the DB continue, replicate the changes. This way a DB consistent state will be replicated. The cool thing about the NetApp implementation is that on system B the snap's (x, x-1, x-2, etc) are also available. When there is trouble, you can choose to online the DB on system B on any of the snap's, or, even cooler, to replicate one of those snap's back to system A, doing a block based rollback at the filesystem level. In the ZFS world, this would be the zfs send and zfs recv functionality. In case anyone wants to read up on how it works over there, for ideas on how it could be implemented for btrfs in the future. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is there a more aggressive fixer than btrfsck?
On Tue, Jun 29, 2010 at 3:37 AM, Daniel Kozlowski dan.kozlow...@gmail.com wrote: On Mon, Jun 28, 2010 at 10:31 PM, Rodrigo E. De León Plicet rdele...@gmail.com wrote: On Mon, Jun 28, 2010 at 8:48 AM, Daniel Kozlowski dan.kozlow...@gmail.com wrote: Sean Bartell wingedtachikoma at gmail.com writes: Is there a more aggressive filesystem restorer than btrfsck? It simply gives up immediately with the following error: btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. btrfsck currently only checks whether a filesystem is consistent. It doesn't try to perform any recovery or error correction at all, so it's mostly useful to developers. Any error handling occurs while the filesystem is mounted. Is there any plan to implement this functionality. It would seem to me to be a pretty basic feature that is missing ? If Btrfs aims to be at least half of what ZFS is, then it will not impose a need for fsck at all. Read No, ZFS really doesn't need a fsck at the following URL: http://www.c0t0d0s0.org/archives/6071-No,-ZFS-really-doesnt-need-a-fsck.html Interesting idea. it would seem to me however that the functionality described in that article is more concerned with a bad transaction rather then something like a hardware failure where a block written more then 128 transactions ago is now corrupted and consiquently the entire partition is now unmountable( that is what I think i am looking at with BTRFS ) In the ZFS case, this is handled by checksumming and redundant data, and can be discovered (and fixed) via either reading the affected data block (in which case, the checksum is wrong, the data is read from a redundant data block, and the correct data is written over the incorrect data) or by running a scrub. Self-healing, checksumming, data redundancy eliminate the need for online (or offline) fsck. Automatic transaction rollback at boot eliminates the need for fsck at boot, as there is no such thing as a dirty filesystem. Either the data is on disk and correct, or it doesn't exist. Yes, you may lose data. But you will never have a corrupted filesystem. Not sure how things work for btrfs. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html