Hello, These are interesting results, but I'm slightly confused. It seems that you have either misinterpreted or changed the original subject (no problem), which was referring to a large FileSet exclusion list for backup.
If I understand correctly, you have implemented an AA (never heard of it) balanced tree for building the restore in memory tree. Is that correct? On Tuesday 30 January 2007 08:57, Rudolf Cejka wrote: > Kern Sibbald wrote (2007/01/29): > > Bacula will do a linear search through the exclude list. Thus it could > > be extremely CPU intensive. For a large list (more than 1000 files) I > > believe it (the list) needs to be put into a hash tree, which is code > > that does not exist. > > Hello, I have implemented AA balanced tree (balanced delete is not > yet done) and preliminary tests shown the following results: > > Multiple times are repeated tries of tree building when doing a restore > since system reboot (database caching showed to be very important > in these measurements): Yes, it is probably also kernel caching on the Bacula side as well. I always run the test 10 times sequentially, throw out the first two results and then discard any subsequent test more than 10% from the average, then finally make an average. That usually gives reasonably consistent results. > > Job with 3557150 files, typial number of files in directories: > current dlist: 439 s 193 s 126 s 125 s > aa balanced tree: 438 s 196 s 123 s 122 s > > Job with 2627027 files, however there are four directories with > over 100000 files per directory (306426, 124553, 124552, 113074, 25816, > 18861, 16404, 16404, 14876, 12935, 12411, 10893, ...): > current dlist: 5302 s 5216 s ... did not want to wait anymore > aa balanced tree: 171 s 86 s 85 s 85 s Yes, if you are talking about the restore memory tree, the big problem is always directories containing large numbers (more than 10000) files. > > However I still do not understand, why bacula reports 3571072 FD/SD > Files Written in the first case, so restore tree is smaller by 13922 files > for the job, and 2767312 FD/SD Files Written in the second case, which > is smaller by 140285 files for the job - is it good? These file numbers > are reported in both cases, dlist and aa balanced tree. Have you looked at using the Bacula red-black tree code? According to the research that I did red-black trees are the fastest of all tree routines. The second question is have you examined the amount of memory required in the two cases to hold the whole tree (current dlink code and balanced tree code)? The structures needed to setup and maintain a tree (balanced or not) are rather more expensive in terms of bytes than a linked list. It might be interesting to be able to disable any code that balances the tree. A balanced tree is primarily optimized for searches rather than inserts, and the big time consumer for Bacula is the inserts. > > Regards. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users