Hello,

These are interesting results, but I'm slightly confused.  It seems that you 
have either misinterpreted or changed the original subject (no problem), 
which was referring to a large FileSet exclusion list for backup.

If I understand correctly, you have implemented an AA (never heard of it) 
balanced tree for building the restore in memory tree.  Is that correct?


On Tuesday 30 January 2007 08:57, Rudolf Cejka wrote:
> Kern Sibbald wrote (2007/01/29):
> > Bacula will do a linear search through the exclude list.  Thus it could
> > be extremely CPU intensive.  For a large list (more than 1000 files) I
> > believe it (the list) needs to be put into a hash tree, which is code
> > that does not exist.
>
> Hello, I have implemented AA balanced tree (balanced delete is not
> yet done) and preliminary tests shown the following results:
>
> Multiple times are repeated tries of tree building when doing a restore
> since system reboot (database caching showed to be very important
> in these measurements):

Yes, it is probably also kernel caching on the Bacula side as well.  I always 
run the test 10 times sequentially, throw out the first two results and then 
discard any subsequent test more than 10% from the average, then finally make 
an average.  That usually gives reasonably consistent results.

>
> Job with 3557150 files, typial number of files in directories:
>   current dlist:      439 s   193 s   126 s   125 s
>   aa balanced tree:   438 s   196 s   123 s   122 s
>
> Job with 2627027 files, however there are four directories with
> over 100000 files per directory (306426, 124553, 124552, 113074, 25816,
> 18861, 16404, 16404, 14876, 12935, 12411, 10893, ...):
>   current dlist:      5302 s  5216 s  ... did not want to wait anymore
>   aa balanced tree:   171 s   86 s    85 s    85 s

Yes, if you are talking about the restore memory tree, the big problem is 
always directories containing large numbers (more than 10000) files.

>
> However I still do not understand, why bacula reports 3571072 FD/SD
> Files Written in the first case, so restore tree is smaller by 13922 files
> for the job, and 2767312 FD/SD Files Written in the second case, which
> is smaller by 140285 files for the job - is it good? These file numbers
> are reported in both cases, dlist and aa balanced tree.

Have you looked at using the Bacula red-black tree code?  According to the 
research that I did red-black trees are the fastest of all tree routines. 

The second question is have you examined the amount of memory required in the 
two cases to hold the whole tree (current dlink code and balanced tree code)?  
The structures needed to setup and maintain a tree (balanced or not) are 
rather more expensive in terms of bytes than a linked list.  It might be 
interesting to be able to disable any code that balances the tree.  A 
balanced tree is primarily optimized for searches rather than inserts, and 
the big time consumer for Bacula is the inserts.

>
> Regards.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to