Davide Alberani wrote:
> When you've time, you can try to find a way to make transactions
> work using the -e command line option; the --sqlite-transactions
> option does the equivalent of:
>   -e 'BEFORE_EVERY_TODB:BEGIN TRANSACTION;' -e 'AFTER_EVERY_TODB:COMMIT;' -e 
> 'BEFORE_INDEXES:BEGIN TRANSACTION;' -e 'END:COMMIT;'

I played around with the autocommit feature. Autocommit is enabled by
default. I disabled it and did executed "COMMIT" every 1000 inserts. It
makes the thing slightly faster (around 10--20%) but not much. I also
tried larger and smaller numbers than 1000 but it doesn't give better
results.

> I checked: the problems are in _character_ names: three names
> are more than 255 chars (max: 479), and other 3 are over 200.
> Longest movie title is 242, longest person name is 84 and the
> longest company name is 176 chars.
> 
> Now I'm worried about the costs, in terms of wasted space and
> insert time, using a VARCHAR(512) instead of a TEXT or VARCHAR(255)
> column.

I changed this to VARCHAR(500) and DB2 stops complaining. Maybe this
modification should got into CVS, since truncated data isn't a good
thing anyway ... At least for DB2, insertion time is the same as before.
And storage space isn't really a big issue today.

BUT: Now I get an exception when writing the biographies to the
database. It seems to be related to the entry "Atkins, Susan (II)" (for
debugging purposes I set the "SCANNING" and "FLUSHING" to be done in
steps as small as possible) ...

--------------------------------------------------------------------
 * FLUSHING SQLData...
 * FLUSHING SQLData...
SCANNING biographies: Atkins, Susan (I)
 * FLUSHING SQLData...
 * FLUSHING SQLData...
SCANNING biographies: Atkins, Susan (II)
 * FLUSHING SQLData...
 * FLUSHING SQLData...
 * FLUSHING SQLData...
 * FLUSHING SQLData...
 * FLUSHING SQLData...
 * FLUSHING SQLData...
 * FLUSHING SQLData...
WARNING: unknown exception caught committing the data
WARNING: to the database; report this as a bug, since
Traceback (most recent call last):
  File "/home/selke/opt/python/bin/imdbpy2sql.py", line 5, in <module>
    pkg_resources.run_script('IMDbPY==3.9cvs20081126', 'imdbpy2sql.py')
  File 
"/home/selke/opt/python/lib/python2.5/site-packages/setuptools-0.6c9-py2.5.egg/pkg_resources.py",
 line 448, in run_script
  File 
"/home/selke/opt/python/lib/python2.5/site-packages/setuptools-0.6c9-py2.5.egg/pkg_resources.py",
 line 1166, in run_script
  File 
"/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py",
 line 2183, in <module>
    run()
  File 
"/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py",
 line 2095, in run
    doNMMVFiles()
  File 
"/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py",
 line 1659, in doNMMVFiles
    nmmvFiles(fp, funct, fname)
  File 
"/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py",
 line 1617, in nmmvFiles
    if v: sqldata.add((mopid, theid, v, note))
  File 
"/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py",
 line 983, in add
    self[key] = None
  File 
"/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py",
 line 978, in __setitem__
    self.flush()
  File 
"/home/selke/opt/python-2.5.2/lib/python2.5/site-packages/IMDbPY-3.9cvs20081126-py2.5-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py",
 line 1038, in flush
    (len(self._tmpDict), e)
AttributeError: 'SQLData' object has no attribute '_tmpDict'
--------------------------------------------------------------------

Any ideas? (I noticed that this biography is pretty long. Maybe it is
too long to be kept within a "LONG VARCHAR" field within DB2, which
is limited to 32 700 characters; CLOB($maxlength) probably would be
a better column type here anyway; how can I change that?)


BTW: The current CVS version of imdbpy2sql.py contains "sys.exit()"
in line 2051, which will stop the script's execution right before
populating the database on my system ...


Joachim
-- 
M. Sc. Joachim Selke
Technische Universität Braunschweig, Institut für Informationssysteme
Mühlenpfordtstraße 23, 38106 Braunschweig, Germany
<http://www.l3s.uni-hannover.de/~selke>


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Reply via email to