On Friday 28 March 2008 08:53:52 Eric Bollengier wrote: > Hello, > > On Friday 28 March 2008 00:16:11 Kern Sibbald wrote: > > I ran a small test of Tokyo Cabinet DBM compared to our htable routines. > > > > Inputting 5 million records to htable and then reading them all back > > takes 7.7 seconds, and uses up 240MB . > > > > Inputting 5 million records to TCDBM (using the same records as above) > > and reading them back takes 1 minute 33 seconds. > > In our case, 1 or 2 minutes in a 5 million files backup is not such a too > big cost :) But during this time, the director have to lock the db > connection... so many things can be frozen.
If that becomes a problem, we can simply write the data to a file, then reread the file and send it. That will avoid waiting for the FD to do its indexing. The other alternative would be to spool the data in the FD. For the moment, I would say to ignore this problem. > > Have you run this test with valid filename ? No it was run with dummy data. > > > Using 1 million records, it runs in 8.8 seconds. > > > > So, it is a bit slower for at a million records, and quite a bit at 5 > > million, but that can probably be tuned. In those tests, I did tune it > > to use something like 40MB of memory. In addition, it mallocs and frees > > each record returned. He has calls to allow the records to be returned > > in our own buffers, so this would probably reduce the time a lot. > > Yes, we can also probably skip some bucket re-allocation if we know in > advance how many files we have. Yes, he has some configuration and tuning APIs -- I used them but did not optimize them. I'll seen you my little test program offline (actually it is his example program that I modified to add the same data creation loop as in htable). I think he has an API that allows you to search the system, but in any case, it would only take about a half hour to feed all the filenames from the system into either htable or tcdbm for doing some tests. I think we have a lot of tests to run before we decide to use this package, but the advantage is that if it is relatively fast, we can replace the in memory part of the tree code with the tcdbm. I think the first thing to do is to use a different call to retrieve the data so that tcdbm doesn't malloc (and we free) all 5 million records. Then it might be worth sending the program to the author and ask if he has any suggestions to make it run faster ... Kern ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
