Re: [Imdbpy-devel] [sql] memory consumption

Emmanuel Tabard Thu, 26 Jan 2012 09:50:58 -0800

Few stats :

RESTORING imdbID values for movies... DONE! (restored 1644956 entries out of 
1952428)
RESTORING imdbID values for people... DONE! (restored 3304069 entries out of 
3320213)
# TIME fushing caches... : 90min, 23sec (wall) 73min, 59sec (user) 1min, 37sec 
(system)
# TIME TOTAL TIME TO INSERT/WRITE DATA : 1193min, 58sec (wall) 1095min, 16sec 
(user) 13min, 7sec (system)
building database indexes (this may take a while)
# TIME createIndexes() : 13min, 56sec (wall) 0min, 0sec (user) 0min, 0sec 
(system)
adding foreign keys (this may take a while)
# TIME createForeignKeys() : 16min, 5sec (wall) 0min, 0sec (user) 0min, 0sec 
(system)
# TIME FINAL : 1223min, 59sec (wall) 1095min, 16sec (user) 13min, 7sec (system)



You can notice that :
 - title 84% success
 - name 99% success

But I didn't watch the diffs. I don't know if the restore fails somehow or if 
imdb has a lot of editing :)


- Emmanuel
Le 26 janv. 2012 à 18:42, Emmanuel Tabard a écrit :

>> 
>> It's so slow and takes so much memory because it was thought to work with
>> a few hundreds of entries. :-D
> 
> Fair enough :D
> 
>> Wow, that's an interesting problem... I guess it can be heavily improved,
>> especially if we can store some information to the disc.
>> Anyway, it's not an easy task: the real problem is that we don't have a
>> unique ID to identify a movie (that would be the ID that we're saving... but
>> the problem is matching it to the other information of the row: title, year,
>> imdb_index, kind, etc. etc.)
> 
> The thing is, the whole database takes 5go. That's why I was wondering how 
> the script can eat 20go of memory. Maybe sqlobject leaks ! 
> You could do it in 4 steps : 
> - Grab all informations from the existing database (imdb id, title, index, 
> year, kind) and store it in a temporary table or text file.
> - Drop the database
> - rebuild it
> - iterate in your file/temp table and restore the ids one by one
> 
> But it could be slow to query the fresh database with your temp table datas. 
> (Because of the text fields ...)
> Anyway, it takes 10 hours to store the ids in memory. Can't be worse :D
> 
> To make it faster you can also generate a unique signature for each rows 
> (sha1(title, index, year, kinds)?). Index this field and your temp table 
> would be : imdbid | signature.
> It should be quick.
> 
> With mysql you can also warmup indexes this way :
> 
> SHOW TABLES in imdbpy
> -> for each table LOAD INDEX INTO CACHE table
> 
> 
> - Emmanuel

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d

_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Re: [Imdbpy-devel] [sql] memory consumption

Reply via email to