jos houtman wrote:
- The collection is saved on a 9TB system.
- The backups are two off-site 4TB systems, the collections needs to be
split over these.
Now what kind of systems are these? Home-grown arrays or "real" ones? In
the latter case, are there no vendor-provided approaches to this?
I'm not sure how this would apply to regular filesystems (no idea which
one you use though), but in "larger" (not size-wise) systems, a bitmap
of the filesystem is kept in a separate location separate, and disk
areas with changed or added files are marked as dirty, and transferred
to the remote host either immediately (with synchronous i/o), as soon as
possible (async i/o), or when requested (veeeery async i/o ;)). This is
rather effective system, with the backup speed mainly dependent on the
size you would choose for the bitmap (large bitmap => smaller blocks =>
potentially less data) and transfer speed.. Restructuring of data on the
physical disk would also create a major update of blocks to be transferred.
I suppose that that approach on a standard linux filesystem would
require some extensive hacking of the fs-code, which probably isn't the
first route to try.
- Our backup-window is the whole day as long as this does not provide a
performance drain. Reality is that we need to use the quiet night hours
0 to 8.
- The collection is stored in a set of subdirectories each containing
50.000 files. (1-50000,50001-100000, etc). There are ~300 subdirs in use
now.
Marking folders as dirty is another solution, however 50k files is a bit
big. Implementing dirty files in chunks of say 50 or 100 would be a
half-way solution, but that'd be dependant on the application [see below].
Only problem is constructing the list and capturing the knowledge while
it is available, two options exist:
At system level this can be done using for example I-notify, this
requires a user-daemon. If the daemon crashes changes will be missed
though.
At application (the one making the changes) level this can also be done,
when the application crashes no changes are made, so nothing is missed.
But it does require making the backup dependent on the application. Not
an ideal situation.
Sure, it's not ideal, but as you put it yourself "when the application
crashes no changes are made", so there's no real loss in that case.
Provided of course that nobody accidentally comments the wrong lines of
code ;)
Not sure if this is of any help to you, I've mainly been involved with
these kinds of setups with hardware solutions, so I'm a loss as to how
they relate to a software approach to it. And I'm lacking caffeine ;)
/Björn
--
[email protected] mailing list