On 07/21/11 16:35, John Stoffel wrote: > Maybe it's time for bacula to re-think it's DB schema. For example, > the Path table is horribly inefficient. It replicates redundant > data. For example, from my setup (using Mysql I admit) at home, I > have around 189,000 paths. Just the first 40 show me part of the trouble: > > | 22 | /home/john/src/CueCat/cuecat-0.8.0/contrib/ > | 23 | /home/john/src/CueCat/cuecat-0.8.0/cuecat_RS232_pod/images/ > | 24 | /home/john/src/CueCat/cuecat-0.8.0/cuecat_RS232_pod/ > | 25 | /home/john/src/CueCat/cuecat-0.8.0/old/ > | 26 | > /home/john/src/CueCat/cuecat-0.8.0/patched_kernel_files/drivers/char/ > | 27 | > /home/john/src/CueCat/cuecat-0.8.0/patched_kernel_files/drivers/input/ > | 28 | /home/john/src/CueCat/cuecat-0.8.0/patched_kernel_files/drivers/ > | 29 | /home/john/src/CueCat/cuecat-0.8.0/patched_kernel_files/init/ > | 30 | /home/john/src/CueCat/cuecat-0.8.0/patched_kernel_files/ > | 31 | /home/john/src/CueCat/cuecat-0.8.0/ > | 32 | /home/john/src/CueCat/foocat-barcode-0.1.3.1/contrib/ > | 33 | /home/john/src/CueCat/foocat-barcode-0.1.3.1/ > | 34 | /home/john/src/CueCat/ > | 35 | /home/john/src/Cyclades/cyc_async-6.5.5/common/cyclades-z/ > | 36 | /home/john/src/Cyclades/cyc_async-6.5.5/common/cyclom-y/ > | 37 | /home/john/src/Cyclades/cyc_async-6.5.5/common/lib/ > > Just for lines 22-34, you have a *ton* of redundant info. It should > instead be a tree structure with the Schema of: > > Field Type > --------- ------- > PathID int(10) > ParentID int(10) > Path blob > > and you'd also make traversing the structure much more efficient and > simpler. And the size of those BLOB Path entries would shrink as > well, more than offsetting the size of the ParentID you add.
Except that every path retrieval would then become a recursive query with depth equal to the number of elements in the pathname, to walk through the parent of the parent of the parent of the parent of the parent of the parent of the parent of the parent of the parent of the parent of the path. Not only that, but since you carry only parent information, it would mean that whenever building a directory tree, you would have to scan the entire path table to identify leaf nodes (nodes which no other node has as a parent), then build the tree from the bottom up starting from the leaf nodes. Worse, whenever adding a new node, you'd have to do pretty much the same thing, to make certain you were adding drivers/ as a child node of the RIGHT patched_kernel_files/ node. The only way to avoid this would be to have each node have a variable-length list of child nodes, and you'd still have to walk the tree node by node to figure out what node to make the new node a child of. Compare this to NOW, where you just look to see if the path exists, and if not, you insert it. Done. I'm sorry, but you really didn't think this through. This is a horrifically bad idea. Trust me, it's MUCH more efficient to store and retrieve the full path *once*. What you're proposing would have the exact opposite effect of the "optimize for restores" approach you're suggesting: In order to save a little storage, it would dramatically slow down both backups AND restores. -- Phil Stracchino, CDK#2 DoD#299792458 ICBM: 43.5607, -71.355 ala...@caerllewys.net ala...@metrocast.net p...@co.ordinate.org Renaissance Man, Unix ronin, Perl hacker, SQL wrangler, Free Stater It's not the years, it's the mileage. ------------------------------------------------------------------------------ 5 Ways to Improve & Secure Unified Communications Unified Communications promises greater efficiencies for business. UC can improve internal communications as well as offer faster, more efficient ways to interact with customers and streamline customer service. Learn more! http://www.accelacomm.com/jaw/sfnl/114/51426253/ _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel