I was wondering, why don't you use the original dbs ? Something like that takes 3 seconds:
"CREATE TABLE title_extract SELECT imdb_id, md5sum FROM title WHERE imdb_id IS NOT NULL CREATE TABLE name_extract SELECT imdb_id, md5sum FROM name WHERE imdb_id IS NOT NULL " And use your query to restore. Should be freaking fast ... Le 12 févr. 2012 à 14:56, Davide Alberani a écrit : > On Sun, Feb 12, 2012 at 14:20, Emmanuel Tabard <m...@webitup.fr> wrote: >> >> Fair enough ! >> When it was selecting all the not null ids, the memory of the process grows >> up and the size of the .db never grows up. >> My theory is that dbm save on close ? Does that make sense ? > > Strange (even if, being anydbm a generic interface to various underlying > modules, you can never tell). > > This simple snippet, on my system, creates a 1.2 Gb files and in the process > the memory in not used much (besides for caches, but it doesn't matter): > > #!/usr/bin/env python > import time > import anydbm > > long_string = 'LALALALA' * 1024 > db = anydbm.open('/tmp/big.db', 'n') > for x in xrange(100000): > x = str(x) > db[x] = long_string > > print 'INSERT' > db.close() > print 'CLOSE' > time.sleep(10) > print 'DONE' > sys.exit() > #====================== > > I fear that the leak is in the cycle on the result of the 'select'. :-/ > > > -- > Davide Alberani <davide.alber...@gmail.com> [PGP KeyID: 0x465BFD47] > http://www.mimante.net/ ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel