The next topic is the pool structure in 4.x.

Here are the differences in pool file storage between 3.x and 4.x:

 - Digest changes from partial MD4 to full-file MD5. This will
   significantly reduce pool collisions - in almost all
   installations there will be no pool collisions.  The most
   common exception will be if someone uses the now well-known
   constructed cases of different files with MD5 collisions.

   In 3.x a partial MD4 digest is used, so collisions are more
   common.  Also, the file system's hardlink limit can also
   cause more entries in a pool file chain.  In 4.x reference
   counting is done using a simple database, so the file system
   hardlink limit isn't relevant.

 - If pool files do collide, a chain is created by appending one or
   more bytes to the MD5 digest as a counter.  The first instance of a
   pool file will have a regular 16 byte digest.  The next file that is
   different but has the same MD5 digest will be stored as a 17 byte
   digest with an extra byte of 0x01.  The 256th file in the chain
   (unlikely of course) will have two more bytes appended: 0x0100.
   The extension is basically the file index with leading 0x00 bytes
   removed.

 - 4.x doesn't use hardlinks (except as inherited from existing 3.x
   pools).

 - In 4.x pool files are never renamed.  In 3.x pool files in a chain
   of repeated digests will be renamed if one of the middle files is
   deleted. In the unlikely even there is a chain of repeated files in
   4.x, and one of the files is deleted (ie: no longer referenced),
   then it is replaced by a zero-length file.  That acts as a tag that
   searching through the chain should continue past that point, and
   also acts as a tag that that file can be replaced by a real pool
   file when the next file is added.

 - In 4.x the pool files are stored two-levels deep, with 128
   directories at each level.  The directories are numbered in hex
   from 00 to fe in steps of 2.  The directory names are based on
   the first two bytes of the MD5 digest, each anded with 0xfe.
   For example, a file with digest 0458d9d0e9ddd2b6b21a1e60b6cdf323
   will be stored in:

       CPOOL_DIR/04/58/0458d9d0e9ddd2b6b21a1e60b6cdf323

   while a file with digest 09682c6df94c87b1e9ee6e1d0d89e8f2 will be
   stored in:

       CPOOL_DIR/08/68/09682c6df94c87b1e9ee6e1d0d89e8f2

   (notice that 0x09 & 0xfe == 0x08).

   In 3.x the directories are three levels deep, with 16 directories
   at each level based on the first 3 hex digests of the partial
   MD4 digest.  So in 3.x there are 16^3 = 4096 leaf directories,
   while in 4.x there are 128 * 128 = 16384 leaf directories.

 - The 3.x and 4.x CPOOL_DIR is the same.  The trees below are
   separate because of the directory naming conventions.

 - In 4.x when pool file matching occurs the full-file MD5 digest
   is needed to match files.  There is also a flag, $bpc->{PoolV3},
   that determines whether old 3.x pool files should be checked
   too. Currently that flag is hardcoded and I need to make it
   autodetect whether there are any old pool files (I guess based
   on BackupPC_nightly?).  If PoolV3 is set and there are no
   candidate 4.x files, then the old digest is computed too and
   3.x candidate pool files are also checked for matches.

   If an old pool 3.x file is matched, then that file is renamed to
   the corresponding 4.x pool file path (based on the MD5 digest).
   This file might still have multiple hardlinks due to the existing
   3.x backups.  As those backups are expired, eventually the link
   count on the pool file will decrease to 1.

 - For backing up the BackupPC store in a mixed V3/V4 environment it
   should be possible just copy the new V4 pool and new V4 backups
   (without worrying about hardlinks that might remain on pool files
   from V3 backups).  However, I need to devise a way of determining
   the paths of the V4 backups.  Perhaps I should add a utility that
   lists all the directories that should be backed up?

Craig

------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
BackupPC-devel mailing list
BackupPC-devel@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-devel
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to