> It's so slow and takes so much memory because it was thought to work with
> a few hundreds of entries. :-D

Fair enough :D

> Wow, that's an interesting problem... I guess it can be heavily improved,
> especially if we can store some information to the disc.
> Anyway, it's not an easy task: the real problem is that we don't have a
> unique ID to identify a movie (that would be the ID that we're saving... but
> the problem is matching it to the other information of the row: title, year,
> imdb_index, kind, etc. etc.)

The thing is, the whole database takes 5go. That's why I was wondering how the 
script can eat 20go of memory. Maybe sqlobject leaks ! 
You could do it in 4 steps : 
- Grab all informations from the existing database (imdb id, title, index, 
year, kind) and store it in a temporary table or text file.
- Drop the database
- rebuild it
- iterate in your file/temp table and restore the ids one by one

But it could be slow to query the fresh database with your temp table datas. 
(Because of the text fields ...)
Anyway, it takes 10 hours to store the ids in memory. Can't be worse :D

To make it faster you can also generate a unique signature for each rows 
(sha1(title, index, year, kinds)?). Index this field and your temp table would 
be : imdbid | signature.
It should be quick.

With mysql you can also warmup indexes this way :

SHOW TABLES in imdbpy
-> for each table LOAD INDEX INTO CACHE table


- Emmanuel

Le 26 janv. 2012 à 17:23, Davide Alberani a écrit :

> On Thu, Jan 26, 2012 at 15:32, Emmanuel Tabard <m...@webitup.fr> wrote:
>> 
>> First of all, thank you for imdbpy. This is really plug'n play, well done !!!
> 
> Thanks. :-P
> 
>> Context :
>> - Import all imdb database (from text dumps) - first time it's fast and ok
>> - I have the imdb ids for 90% of titles and names (no need for companies and 
>> characters)
> 
> That's a lot of data. :)
> 
>> My problem comes when imdbpy updates my database. It takes hours to save the 
>> imdb ids and it consumes a *lot* of memory. Almost all of my RAM (24go) ...
>> 
>> Is there a way to optimize that step ? Why does it takes so much memory ?
> 
> It's so slow and takes so much memory because it was thought to work with
> a few hundreds of entries. :-D
> 
> Wow, that's an interesting problem... I guess it can be heavily improved,
> especially if we can store some information to the disc.
> Anyway, it's not an easy task: the real problem is that we don't have a
> unique ID to identify a movie (that would be the ID that we're saving... but
> the problem is matching it to the other information of the row: title, year,
> imdb_index, kind, etc. etc.)
> 
> Hmmm... I promise to think about it in the weekend.  If anyone have a
> nice solution to this problem, any hint is welcome!
> 
> -- 
> Davide Alberani <davide.alber...@gmail.com>  [PGP KeyID: 0x465BFD47]
> http://www.mimante.net/


------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Reply via email to