I was wondering, why don't you use the original dbs ?

Something like that takes 3 seconds: 

"CREATE TABLE title_extract SELECT imdb_id, md5sum FROM title WHERE imdb_id IS 
NOT NULL
CREATE TABLE name_extract SELECT imdb_id, md5sum FROM name WHERE imdb_id IS NOT 
NULL
"

And use your query to restore.

Should be freaking fast ...
 
Le 12 févr. 2012 à 14:56, Davide Alberani a écrit :

> On Sun, Feb 12, 2012 at 14:20, Emmanuel Tabard <m...@webitup.fr> wrote:
>> 
>> Fair enough !
>> When it was selecting all the not null ids, the memory of the process grows
>> up and the size of the .db never grows up.
>> My theory is that dbm save on close ? Does that make sense ?
> 
> Strange (even if, being anydbm a generic interface to various underlying
> modules, you can never tell).
> 
> This simple snippet, on my system, creates a 1.2 Gb files and in the process
> the memory in not used much (besides for caches, but it doesn't matter):
> 
> #!/usr/bin/env python
> import time
> import anydbm
> 
> long_string = 'LALALALA' * 1024
> db = anydbm.open('/tmp/big.db', 'n')
> for x in xrange(100000):
>    x = str(x)
>    db[x] = long_string
> 
> print 'INSERT'
> db.close()
> print 'CLOSE'
> time.sleep(10)
> print 'DONE'
> sys.exit()
> #======================
> 
> I fear that the leak is in the cycle on the result of the 'select'. :-/
> 
> 
> -- 
> Davide Alberani <davide.alber...@gmail.com>  [PGP KeyID: 0x465BFD47]
> http://www.mimante.net/


------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Reply via email to