Davide Alberani wrote: > When you've time, you can try to find a way to make transactions > work using the -e command line option; the --sqlite-transactions > option does the equivalent of: > -e 'BEFORE_EVERY_TODB:BEGIN TRANSACTION;' -e 'AFTER_EVERY_TODB:COMMIT;' -e > 'BEFORE_INDEXES:BEGIN TRANSACTION;' -e 'END:COMMIT;'
I played around with the autocommit feature. Autocommit is enabled by default. I disabled it and did executed "COMMIT" every 1000 inserts. It makes the thing slightly faster (around 10--20%) but not much. I also tried larger and smaller numbers than 1000 but it doesn't give better results. > I checked: the problems are in _character_ names: three names > are more than 255 chars (max: 479), and other 3 are over 200. > Longest movie title is 242, longest person name is 84 and the > longest company name is 176 chars. > > Now I'm worried about the costs, in terms of wasted space and > insert time, using a VARCHAR(512) instead of a TEXT or VARCHAR(255) > column. I changed this to VARCHAR(500) and DB2 stops complaining. Maybe this modification should got into CVS, since truncated data isn't a good thing anyway ... At least for DB2, insertion time is the same as before. And storage space isn't really a big issue today. BUT: Now I get an exception when writing the biographies to the database. It seems to be related to the entry "Atkins, Susan (II)" (for debugging purposes I set the "SCANNING" and "FLUSHING" to be done in steps as small as possible) ... -------------------------------------------------------------------- * FLUSHING SQLData... * FLUSHING SQLData... SCANNING biographies: Atkins, Susan (I) * FLUSHING SQLData... * FLUSHING SQLData... SCANNING biographies: Atkins, Susan (II) * FLUSHING SQLData... * FLUSHING SQLData... * FLUSHING SQLData... * FLUSHING SQLData... * FLUSHING SQLData... * FLUSHING SQLData... * FLUSHING SQLData... WARNING: unknown exception caught committing the data WARNING: to the database; report this as a bug, since Traceback (most recent call last): File "/home/selke/opt/python/bin/imdbpy2sql.py", line 5, in <module> pkg_resources.run_script('IMDbPY==3.9cvs20081126', 'imdbpy2sql.py') File "/home/selke/opt/python/lib/python2.5/site-packages/setuptools-0.6c9-py2.5.egg/pkg_resources.py", line 448, in run_script File "/home/selke/opt/python/lib/python2.5/site-packages/setuptools-0.6c9-py2.5.egg/pkg_resources.py", line 1166, in run_script File "/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py", line 2183, in <module> run() File "/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py", line 2095, in run doNMMVFiles() File "/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py", line 1659, in doNMMVFiles nmmvFiles(fp, funct, fname) File "/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py", line 1617, in nmmvFiles if v: sqldata.add((mopid, theid, v, note)) File "/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py", line 983, in add self[key] = None File "/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py", line 978, in __setitem__ self.flush() File "/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py", line 1038, in flush (len(self._tmpDict), e) AttributeError: 'SQLData' object has no attribute '_tmpDict' -------------------------------------------------------------------- Any ideas? (I noticed that this biography is pretty long. Maybe it is too long to be kept within a "LONG VARCHAR" field within DB2, which is limited to 32 700 characters; CLOB($maxlength) probably would be a better column type here anyway; how can I change that?) BTW: The current CVS version of imdbpy2sql.py contains "sys.exit()" in line 2051, which will stop the script's execution right before populating the database on my system ... Joachim -- M. Sc. Joachim Selke Technische Universität Braunschweig, Institut für Informationssysteme Mühlenpfordtstraße 23, 38106 Braunschweig, Germany <http://www.l3s.uni-hannover.de/~selke> ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel