On Thu, Jan 8, 2009 at 10:45 PM, Mike Castle <dalg...@gmail.com> wrote: > On Thu, Jan 8, 2009 at 12:13 PM, Mike Castle <dalg...@gmail.com> wrote: >> >> If one has enough ram, a ramdisk/tmpfs of some sort might be helpful >> (I don't have enough swap configured to test at the moment). > > For the curious: > > On a somehwat wimpy Linux machine with 1G of ram, that's been busy > also encoding files, I set up a 5G tmpfs. > > Time to create: > # TIME createIndexes() : 134 min, 24 sec. > DONE! (in 424 minutes, 25 seconds)
So I've done some more performance analysis. First, I remembered to build the DB on a file system that is NOT journaled. This actually got me to a time similar to above. That's not too surprising, since I know the resulting file was too big to fit into RAM. The time spent paging tmpfs in and out seems to be about the same as an ext2 system. So, turn off filesystem journaling would be the first recommendation. (I knew this of course, I just didn't think it would take THAT much longer that it would actually matter for this case.) Second, I used the following command line option with OUT transations: -e 'BEGIN:PRAGMA synchronous = OFF;' I was disappointed to find out that didn't help. On my third run, I used both that pragma and --sqlite-transactions and got the following: TOTAL TIME TO INSERT/WRITE DATA: 184 minutes, 29 seconds EXECUTING "BEFORE_INDEXES:BEGIN TRANSACTION;"... EXECUTING "BEGIN TRANSACTION;"... DONE! # TIME BEFORE_INDEXES command : 0 min, 0 sec. building database indexes (this may take a while) # TIME createIndexes() : 75 min, 56 sec. DONE! (in 260 minutes, 25 seconds) real 260m28.568s user 225m3.944s sys 15m4.941s That's a nearly 2h 44m savings out of 7h for my original run. There is a lot of variability because this machine is often busy converting movies and serving NFS, and I'm running each of these tests only once. But, still it looks promising. If imdbpy2sql offered up system statistics in addition to pure wall time, that might be nice. I see that sqlite also offers some some different journaling options, so I might play with those to see how they affect things. mrc ------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel