Few stats :
RESTORING imdbID values for movies... DONE! (restored 1644956 entries out of
1952428)
RESTORING imdbID values for people... DONE! (restored 3304069 entries out of
3320213)
# TIME fushing caches... : 90min, 23sec (wall) 73min, 59sec (user) 1min, 37sec
(system)
# TIME TOTAL TIME TO INSERT/WRITE DATA : 1193min, 58sec (wall) 1095min, 16sec
(user) 13min, 7sec (system)
building database indexes (this may take a while)
# TIME createIndexes() : 13min, 56sec (wall) 0min, 0sec (user) 0min, 0sec
(system)
adding foreign keys (this may take a while)
# TIME createForeignKeys() : 16min, 5sec (wall) 0min, 0sec (user) 0min, 0sec
(system)
# TIME FINAL : 1223min, 59sec (wall) 1095min, 16sec (user) 13min, 7sec (system)
You can notice that :
- title 84% success
- name 99% success
But I didn't watch the diffs. I don't know if the restore fails somehow or if
imdb has a lot of editing :)
- Emmanuel
Le 26 janv. 2012 à 18:42, Emmanuel Tabard a écrit :
>>
>> It's so slow and takes so much memory because it was thought to work with
>> a few hundreds of entries. :-D
>
> Fair enough :D
>
>> Wow, that's an interesting problem... I guess it can be heavily improved,
>> especially if we can store some information to the disc.
>> Anyway, it's not an easy task: the real problem is that we don't have a
>> unique ID to identify a movie (that would be the ID that we're saving... but
>> the problem is matching it to the other information of the row: title, year,
>> imdb_index, kind, etc. etc.)
>
> The thing is, the whole database takes 5go. That's why I was wondering how
> the script can eat 20go of memory. Maybe sqlobject leaks !
> You could do it in 4 steps :
> - Grab all informations from the existing database (imdb id, title, index,
> year, kind) and store it in a temporary table or text file.
> - Drop the database
> - rebuild it
> - iterate in your file/temp table and restore the ids one by one
>
> But it could be slow to query the fresh database with your temp table datas.
> (Because of the text fields ...)
> Anyway, it takes 10 hours to store the ids in memory. Can't be worse :D
>
> To make it faster you can also generate a unique signature for each rows
> (sha1(title, index, year, kinds)?). Index this field and your temp table
> would be : imdbid | signature.
> It should be quick.
>
> With mysql you can also warmup indexes this way :
>
> SHOW TABLES in imdbpy
> -> for each table LOAD INDEX INTO CACHE table
>
>
> - Emmanuel
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel