W dniu 2017-02-08 o 14:32 PM, Austin S. Hemmelgarn pisze: > On 2017-02-08 08:26, Martin Raiber wrote: >> On 08.02.2017 14:08 Austin S. Hemmelgarn wrote: >>> On 2017-02-08 07:14, Martin Raiber wrote: >>>> Hi, >>>> >>>> On 08.02.2017 03:11 Peter Zaitsev wrote: >>>>> Out of curiosity, I see one problem here: >>>>> If you're doing snapshots of the live database, each snapshot leaves >>>>> the database files like killing the database in-flight. Like shutting >>>>> the system down in the middle of writing data. >>>>> >>>>> This is because I think there's no API for user space to subscribe to >>>>> events like a snapshot - unlike e.g. the VSS API (volume snapshot >>>>> service) in Windows. You should put the database into frozen state to >>>>> prepare it for a hotcopy before creating the snapshot, then ensure >>>>> all >>>>> data is flushed before continuing. >>>>> >>>>> I think I've read that btrfs snapshots do not guarantee single >>>>> point in >>>>> time snapshots - the snapshot may be smeared across a longer >>>>> period of >>>>> time while the kernel is still writing data. So parts of your writes >>>>> may still end up in the snapshot after issuing the snapshot command, >>>>> instead of in the working copy as expected. >>>>> >>>>> How is this going to be addressed? Is there some snapshot aware >>>>> API to >>>>> let user space subscribe to such events and do proper preparation? Is >>>>> this planned? LVM could be a user of such an API, too. I think this >>>>> could have nice enterprise-grade value for Linux. >>>>> >>>>> XFS has xfs_freeze and xfs_thaw for this, to prepare LVM >>>>> snapshots. But >>>>> still, also this needs to be integrated with MySQL to properly >>>>> work. I >>>>> once (years ago) researched on this but gave up on my plans when I >>>>> planned database backups for our web server infrastructure. We >>>>> moved to >>>>> creating SQL dumps instead, although there're binlogs which can be >>>>> used >>>>> to recover to a clean and stable transactional state after taking >>>>> snapshots. But I simply didn't want to fiddle around with properly >>>>> cleaning up binlogs which accumulate horribly much space usage over >>>>> time. The cleanup process requires to create a cold copy or dump >>>>> of the >>>>> complete database from time to time, only then it's safe to remove >>>>> all >>>>> binlogs up to that point in time. >>>> >>>> little bit off topic, but I for one would be on board with such an >>>> effort. It "just" needs coordination between the backup >>>> software/snapshot tools, the backed up software and the various >>>> snapshot >>>> providers. If you look at the Windows VSS API, this would be a >>>> relatively large undertaking if all the corner cases are taken into >>>> account, like e.g. a database having the database log on a separate >>>> volume from the data, dependencies between different components etc. >>>> >>>> You'll know more about this, but databases usually fsync quite >>>> often in >>>> their default configuration, so btrfs snapshots shouldn't be much >>>> behind >>>> the properly snapshotted state, so I see the advantages more with >>>> usability and taking care of corner cases automatically. >>> Just my perspective, but BTRFS (and XFS, and OCFS2) already provide >>> reflinking to userspace, and therefore it's fully possible to >>> implement this in userspace. Having a version of the fsfreeze (the >>> generic form of xfs_freeze) stuff that worked on individual sub-trees >>> would be nice from a practical perspective, but implementing it would >>> not be easy by any means, and would be essentially necessary for a >>> VSS-like API. In the meantime though, it is fully possible for the >>> application software to implement this itself without needing anything >>> more from the kernel. >> >> VSS snapshots whole volumes, not individual files (so comparable to an >> LVM snapshot). The sub-folder freeze would be something useful in some >> situations, but duplicating the files+extends might also take too long >> in a lot of situations. You are correct that the kernel features are >> there and what is missing is a user-space daemon, plus a protocol that >> facilitates/coordinates the backups/snapshots. >> >> Sending a FIFREEZE ioctl, taking a snapshot and then thawing it does not >> really help in some situations as e.g. MySQL InnoDB uses O_DIRECT and >> manages its on buffer pool which won't get the FIFREEZE and flush, but >> as said, the default configuration is to flush/fsync on every commit. > OK, there's part of the misunderstanding. You can't FIFREEZE a BTRFS > filesystem and then take a snapshot in it, because the snapshot > requires writing to the filesystem (which the FIFREEZE would prevent, > so a script that tried to do this would deadlock). A new version of > the FIFREEZE ioctl would be needed that operates on subvolumes. You can also you put your filesystem on LVM, and take LVM snapshots.
-- Adrian BrzeziĆski -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html