Here is a little workarround : -- Extract imdb_id and md5sum (6sec) CREATE TABLE title_extract SELECT imdb_id, md5sum FROM title WHERE imdb_id IS NOT NULL; CREATE TABLE name_extract SELECT imdb_id, md5sum FROM name WHERE imdb_id IS NOT NULL;
-- Add indexes (12sec) ALTER TABLE title_extract ADD INDEX md5sum_idx (md5sum) ALTER TABLE name_extract ADD INDEX md5sum_idx (md5sum) -- Reset imdb ids ... UPDATE title SET imdb_id = NULL; UPDATE name SET imdb_id = NULL; -- Restore imdb ids for movies (2min) UPDATE title INNER JOIN title_extract USING (md5sum) SET title.imdb_id = title_extract.imdb_id -- Restore imdb ids for people (5min) UPDATE name INNER JOIN name_extract USING (md5sum) SET name.imdb_id = name_extract.imdb_id Total time save/restore : less than 10minutes Le 12 févr. 2012 à 15:52, Emmanuel Tabard a écrit : > I was wondering, why don't you use the original dbs ? > > Something like that takes 3 seconds: > > "CREATE TABLE title_extract SELECT imdb_id, md5sum FROM title WHERE imdb_id > IS NOT NULL > CREATE TABLE name_extract SELECT imdb_id, md5sum FROM name WHERE imdb_id IS > NOT NULL > " > > And use your query to restore. > > Should be freaking fast ... > > Le 12 févr. 2012 à 14:56, Davide Alberani a écrit : > >> On Sun, Feb 12, 2012 at 14:20, Emmanuel Tabard <m...@webitup.fr> wrote: >>> >>> Fair enough ! >>> When it was selecting all the not null ids, the memory of the process grows >>> up and the size of the .db never grows up. >>> My theory is that dbm save on close ? Does that make sense ? >> >> Strange (even if, being anydbm a generic interface to various underlying >> modules, you can never tell). >> >> This simple snippet, on my system, creates a 1.2 Gb files and in the process >> the memory in not used much (besides for caches, but it doesn't matter): >> >> #!/usr/bin/env python >> import time >> import anydbm >> >> long_string = 'LALALALA' * 1024 >> db = anydbm.open('/tmp/big.db', 'n') >> for x in xrange(100000): >> x = str(x) >> db[x] = long_string >> >> print 'INSERT' >> db.close() >> print 'CLOSE' >> time.sleep(10) >> print 'DONE' >> sys.exit() >> #====================== >> >> I fear that the leak is in the cycle on the result of the 'select'. :-/ >> >> >> -- >> Davide Alberani <davide.alber...@gmail.com> [PGP KeyID: 0x465BFD47] >> http://www.mimante.net/ > ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel