Re: [gentoo-server] Mirroring/backing-up a large 15Million (6TB) file collection

Björn Gustafsson Wed, 19 Apr 2006 07:03:46 -0700

jos houtman wrote:

- The collection is saved on a 9TB system.
- The backups are two off-site 4TB systems, the collections needs to be
split over these.

Now what kind of systems are these? Home-grown arrays or "real" ones? Inthe latter case, are there no vendor-provided approaches to this?

I'm not sure how this would apply to regular filesystems (no idea whichone you use though), but in "larger" (not size-wise) systems, a bitmapof the filesystem is kept in a separate location separate, and diskareas with changed or added files are marked as dirty, and transferredto the remote host either immediately (with synchronous i/o), as soon aspossible (async i/o), or when requested (veeeery async i/o ;)). This israther effective system, with the backup speed mainly dependent on thesize you would choose for the bitmap (large bitmap => smaller blocks =>potentially less data) and transfer speed.. Restructuring of data on thephysical disk would also create a major update of blocks to be transferred.

I suppose that that approach on a standard linux filesystem wouldrequire some extensive hacking of the fs-code, which probably isn't thefirst route to try.

- Our backup-window is the whole day as long as this does not provide a
performance drain. Reality is that we need to use the quiet night hours
0 to 8.
- The collection is stored in a set of subdirectories each containing
50.000 files. (1-50000,50001-100000, etc). There are ~300 subdirs in use
now.

Marking folders as dirty is another solution, however 50k files is a bitbig. Implementing dirty files in chunks of say 50 or 100 would be ahalf-way solution, but that'd be dependant on the application [see below].

Only problem is constructing the list and capturing the knowledge while
it is available, two options exist:
At system level this can be done using for example I-notify, this
requires a user-daemon. If the daemon crashes changes will be missed
though.
At application (the one making the changes) level this can also be done,
when the application crashes no changes are made, so nothing is missed.
But it does require making the backup dependent on the application. Not
an ideal situation.

Sure, it's not ideal, but as you put it yourself "when the applicationcrashes no changes are made", so there's no real loss in that case.Provided of course that nobody accidentally comments the wrong lines ofcode ;)

Not sure if this is of any help to you, I've mainly been involved withthese kinds of setups with hardware solutions, so I'm a loss as to howthey relate to a software approach to it. And I'm lacking caffeine ;)


  /Björn
--
[email protected] mailing list

Re: [gentoo-server] Mirroring/backing-up a large 15Million (6TB) file collection

Reply via email to