On 07/21/11 16:35, John Stoffel wrote:
> Maybe it's time for bacula to re-think it's DB schema.  For example,
> the Path table is horribly inefficient.  It replicates redundant
> data.  For example, from my setup (using Mysql I admit) at home, I
> have around 189,000 paths.  Just the first 40 show me part of the trouble:
> 
> |     22 | /home/john/src/CueCat/cuecat-0.8.0/contrib/
> |     23 | /home/john/src/CueCat/cuecat-0.8.0/cuecat_RS232_pod/images/
> |     24 | /home/john/src/CueCat/cuecat-0.8.0/cuecat_RS232_pod/
> |     25 | /home/john/src/CueCat/cuecat-0.8.0/old/
> |     26 | 
> /home/john/src/CueCat/cuecat-0.8.0/patched_kernel_files/drivers/char/
> |     27 | 
> /home/john/src/CueCat/cuecat-0.8.0/patched_kernel_files/drivers/input/
> |     28 | /home/john/src/CueCat/cuecat-0.8.0/patched_kernel_files/drivers/
> |     29 | /home/john/src/CueCat/cuecat-0.8.0/patched_kernel_files/init/
> |     30 | /home/john/src/CueCat/cuecat-0.8.0/patched_kernel_files/
> |     31 | /home/john/src/CueCat/cuecat-0.8.0/
> |     32 | /home/john/src/CueCat/foocat-barcode-0.1.3.1/contrib/
> |     33 | /home/john/src/CueCat/foocat-barcode-0.1.3.1/
> |     34 | /home/john/src/CueCat/
> |     35 | /home/john/src/Cyclades/cyc_async-6.5.5/common/cyclades-z/
> |     36 | /home/john/src/Cyclades/cyc_async-6.5.5/common/cyclom-y/
> |     37 | /home/john/src/Cyclades/cyc_async-6.5.5/common/lib/
> 
> Just for lines 22-34, you have a *ton* of redundant info.  It should
> instead be a tree structure with the Schema of:
> 
>         Field     Type
>         --------- -------
>       PathID    int(10)
>       ParentID  int(10)
>       Path      blob
> 
> and you'd also make traversing the structure much more efficient and
> simpler.  And the size of those BLOB Path entries would shrink as
> well, more than offsetting the size of the ParentID you add.

Except that every path retrieval would then become a recursive query
with depth equal to the number of elements in the pathname, to walk
through the parent of the parent of the parent of the parent of the
parent of the parent of the parent of the parent of the parent of the
parent of the path.  Not only that, but since you carry only parent
information, it would mean that whenever building a directory tree, you
would have to scan the entire path table to identify leaf nodes (nodes
which no other node has as a parent), then build the tree from the
bottom up starting from the leaf nodes.  Worse, whenever adding a new
node, you'd have to do pretty much the same thing, to make certain you
were adding drivers/ as a child node of the RIGHT patched_kernel_files/
node.  The only way to avoid this would be to have each node have a
variable-length list of child nodes, and you'd still have to walk the
tree node by node to figure out what node to make the new node a child of.

Compare this to NOW, where you just look to see if the path exists, and
if not, you insert it.  Done.

I'm sorry, but you really didn't think this through.  This is a
horrifically bad idea.  Trust me, it's MUCH more efficient to store and
retrieve the full path *once*.

What you're proposing would have the exact opposite effect of the
"optimize for restores" approach you're suggesting:  In order to save a
little storage, it would dramatically slow down both backups AND restores.


-- 
  Phil Stracchino, CDK#2     DoD#299792458     ICBM: 43.5607, -71.355
  ala...@caerllewys.net   ala...@metrocast.net   p...@co.ordinate.org
  Renaissance Man, Unix ronin, Perl hacker, SQL wrangler, Free Stater
                 It's not the years, it's the mileage.

------------------------------------------------------------------------------
5 Ways to Improve & Secure Unified Communications
Unified Communications promises greater efficiencies for business. UC can 
improve internal communications as well as offer faster, more efficient ways
to interact with customers and streamline customer service. Learn more!
http://www.accelacomm.com/jaw/sfnl/114/51426253/
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to