Davide Alberani wrote:
> When you've time, you can try to find a way to make transactions
> work using the -e command line option; the --sqlite-transactions
> option does the equivalent of:
> -e 'BEFORE_EVERY_TODB:BEGIN TRANSACTION;' -e 'AFTER_EVERY_TODB:COMMIT;' -e
> 'BEFORE_INDEXES:BEGIN TRANSACTION;' -e 'END:COMMIT;'
I played around with the autocommit feature. Autocommit is enabled by
default. I disabled it and did executed "COMMIT" every 1000 inserts. It
makes the thing slightly faster (around 10--20%) but not much. I also
tried larger and smaller numbers than 1000 but it doesn't give better
results.
> I checked: the problems are in _character_ names: three names
> are more than 255 chars (max: 479), and other 3 are over 200.
> Longest movie title is 242, longest person name is 84 and the
> longest company name is 176 chars.
>
> Now I'm worried about the costs, in terms of wasted space and
> insert time, using a VARCHAR(512) instead of a TEXT or VARCHAR(255)
> column.
I changed this to VARCHAR(500) and DB2 stops complaining. Maybe this
modification should got into CVS, since truncated data isn't a good
thing anyway ... At least for DB2, insertion time is the same as before.
And storage space isn't really a big issue today.
BUT: Now I get an exception when writing the biographies to the
database. It seems to be related to the entry "Atkins, Susan (II)" (for
debugging purposes I set the "SCANNING" and "FLUSHING" to be done in
steps as small as possible) ...
--------------------------------------------------------------------
* FLUSHING SQLData...
* FLUSHING SQLData...
SCANNING biographies: Atkins, Susan (I)
* FLUSHING SQLData...
* FLUSHING SQLData...
SCANNING biographies: Atkins, Susan (II)
* FLUSHING SQLData...
* FLUSHING SQLData...
* FLUSHING SQLData...
* FLUSHING SQLData...
* FLUSHING SQLData...
* FLUSHING SQLData...
* FLUSHING SQLData...
WARNING: unknown exception caught committing the data
WARNING: to the database; report this as a bug, since
Traceback (most recent call last):
File "/home/selke/opt/python/bin/imdbpy2sql.py", line 5, in <module>
pkg_resources.run_script('IMDbPY==3.9cvs20081126', 'imdbpy2sql.py')
File
"/home/selke/opt/python/lib/python2.5/site-packages/setuptools-0.6c9-py2.5.egg/pkg_resources.py",
line 448, in run_script
File
"/home/selke/opt/python/lib/python2.5/site-packages/setuptools-0.6c9-py2.5.egg/pkg_resources.py",
line 1166, in run_script
File
"/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py",
line 2183, in <module>
run()
File
"/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py",
line 2095, in run
doNMMVFiles()
File
"/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py",
line 1659, in doNMMVFiles
nmmvFiles(fp, funct, fname)
File
"/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py",
line 1617, in nmmvFiles
if v: sqldata.add((mopid, theid, v, note))
File
"/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py",
line 983, in add
self[key] = None
File
"/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py",
line 978, in __setitem__
self.flush()
File
"/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py",
line 1038, in flush
(len(self._tmpDict), e)
AttributeError: 'SQLData' object has no attribute '_tmpDict'
--------------------------------------------------------------------
Any ideas? (I noticed that this biography is pretty long. Maybe it is
too long to be kept within a "LONG VARCHAR" field within DB2, which
is limited to 32 700 characters; CLOB($maxlength) probably would be
a better column type here anyway; how can I change that?)
BTW: The current CVS version of imdbpy2sql.py contains "sys.exit()"
in line 2051, which will stop the script's execution right before
populating the database on my system ...
Joachim
--
M. Sc. Joachim Selke
Technische Universität Braunschweig, Institut für Informationssysteme
Mühlenpfordtstraße 23, 38106 Braunschweig, Germany
<http://www.l3s.uni-hannover.de/~selke>
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Imdbpy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel