Hey !

> Ehi, snowstorms buy you a lot of free time... :-P

:D

> How it works:
> - titles/names with imdbID are stored in a dbm database, using their
> md5 as keys.
> - at restore time, imdbIDs are restored in batches of 10000 each time.

Looks nice !!!

It seems that you load all the datas in memory before storing it in the temp 
databases. 
"cls.select(ISNOTNULL(cls.q.imdbID))"

Maybe you should save the imdbids by batch of 10000entries ?


Tell me if you need the complete database dump to test with tons of datas !

Thank you very much for your help !


Le 11 févr. 2012 à 19:49, Davide Alberani a écrit :

> On Tue, Feb 7, 2012 at 09:20, Davide Alberani <davide.alber...@gmail.com> 
> wrote:
>> 
>> As usual, I'm really busy right now... I hope to have time to give it
>> a look this weekend.
> 
> Ehi, snowstorms buy you a lot of free time... :-P
> 
> It was easier that I thought, mostly thanks to the fact the we already have
> md5 checksum of names and title (a more or less recent feature).
> 
> In the mercurial repository there's a draft of solution.
> 
> How it works:
> - titles/names with imdbID are stored in a dbm database, using their
> md5 as keys.
> - at restore time, imdbIDs are restored in batches of 10000 each time.
> 
> Notes:
> - by default, the database are created in the current directory (and
> not deleted);
>  there's now the '-t dir' command line argument, to specify a
> temporary directory.
> - I've not tested it with huge amounts of data: if it's slow or fails,
> let me know
>  if it's while storing or restoring the IDs (and the error message).
> - 10.000 entries for a batch is *totally* arbitrary: we've to choose a
> good compromise
>  between performances and the maximum size of a query.
> - the batch is executed as a single query, like:
>      UPDATE table SET imdb_id = CASE md5sum WHEN 'md5_1' THEN
> 'imdbID1' ... END WHERE md5sum IN ('md5_1', md5_2', ...)
>  I don't really know if this syntax is valid for every SQL databases...
> - I've simplified the code, maybe too much.
> - I've not tested it with CSV support.
> 
> As usual, any test, bug report, comment and so on is welcome.
> 
> 
> -- 
> Davide Alberani <davide.alber...@gmail.com>  [PGP KeyID: 0x465BFD47]
> http://www.mimante.net/

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Reply via email to