Hi!
On Wed, Apr 19, 2006 at 02:08:56PM +0200, jos houtman wrote:
> current situation:
> - The collection is stored in a set of subdirectories each containing
> 50.000 files. (1-50000,50001-100000, etc). There are ~300 subdirs in use
> now.
> - Files are never deleted.
> - In the future it can happen that files change. my exception is that
> atmost a few thousand files a day will change, scattered over the whole
> collection with an emphasis on the most recent files.
[cut]
> Only problem is constructing the list and capturing the knowledge while
> it is available, two options exist:
> At system level this can be done using for example I-notify, this
> requires a user-daemon. If the daemon crashes changes will be missed
> though.
> At application (the one making the changes) level this can also be done,
> when the application crashes no changes are made, so nothing is missed.
> But it does require making the backup dependent on the application. Not
> an ideal situation.
At first, this issue isn't Gentoo-specific, so it should at least be
marked [OT] in subject, I think. ;-)
My experience in complex backups says: it's nearly impossible to make
effective (fast and reliable) backup for some complex application without
writing that application with backup feature in mind.
In your case that mean, for example: it's probably best solution to
backup issue to change a way how files changed so what changed files
isn't really CHANGED, but instead new version is just ADDED to collection.
This way it will be enough for you to just remember which file was
backuped last by previous backup and on next backup continue from that
file (I suppose all your files are numbered: "(1-50000,50001-100000, etc)").
This way backup will not depend on collection size (only on amount of
added files) and will not depend on some "special feature" in application
(like constructing list of changed files) which may have bugs.
In case if your application need newer version of file has same name
as previous version and this behaviour can't be changed, then you can
consider some special solutions like: after ADDING newer version to
collection replace previous version by symlink to newer version. To
backup these symlinks you will need additional step like:
find /collection -type l -print0 | xargs -0 tar ...
I've no idea is what "find -type l" will be fast enough for you, but I
suppose it will be much much much faster than rsync, just because it
don't need to read all files in collection and calculate their checksums.
--
WBR, Alex.
--
[email protected] mailing list