On Thu, Jan 8, 2009 at 10:45 PM, Mike Castle <dalg...@gmail.com> wrote:
> On Thu, Jan 8, 2009 at 12:13 PM, Mike Castle <dalg...@gmail.com> wrote:
>>
>> If one has enough ram, a ramdisk/tmpfs of some sort might be helpful
>> (I don't have enough swap configured to test at the moment).
>
> For the curious:
>
> On a somehwat wimpy Linux machine with 1G of ram, that's been busy
> also encoding files, I set up a 5G tmpfs.
>
> Time to create:
> # TIME createIndexes() : 134 min, 24 sec.
> DONE! (in 424 minutes, 25 seconds)

So I've done some more performance analysis.

First, I remembered to build the DB on a file system that is NOT
journaled.  This actually got me to a time similar to above.  That's
not too surprising, since I know the resulting file was too big to fit
into RAM.  The time spent paging tmpfs in and out seems to be about
the same as an ext2 system.  So, turn off filesystem journaling would
be the first recommendation.  (I knew this of course, I just didn't
think it would take THAT much longer that it would actually matter for
this case.)

Second, I used the following command line option with OUT transations:
 -e 'BEGIN:PRAGMA synchronous = OFF;'  I was disappointed to find out
that didn't help.

On my third run, I used both that pragma and --sqlite-transactions and
got the following:
TOTAL TIME TO INSERT/WRITE DATA: 184 minutes, 29 seconds
EXECUTING "BEFORE_INDEXES:BEGIN TRANSACTION;"...
EXECUTING "BEGIN TRANSACTION;"... DONE!
# TIME BEFORE_INDEXES command : 0 min, 0 sec.
building database indexes (this may take a while)
# TIME createIndexes() : 75 min, 56 sec.
DONE! (in 260 minutes, 25 seconds)

real    260m28.568s
user    225m3.944s
sys     15m4.941s

That's a nearly 2h 44m savings out of 7h for my original run.

There is a lot of variability because this machine is often busy
converting movies and serving NFS, and I'm running each of these tests
only once.  But, still it looks promising.

If imdbpy2sql offered up system statistics in addition to pure wall
time, that might be nice.

I see that sqlite also offers some some different journaling options,
so I might play with those to see how they affect things.

mrc

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Reply via email to