Hi,
I'm having some trouble with importing the text files with imdbpy2sql.
I'm running Debian with python 2.6.6-8+b1, postgresql 9.0.3-1 and
imdbpy 4.7.0-1.
I created a database called imdb in the usual way. Debian puts
imdbpy2sql in /usr/share/doc/python-imdbpy/examples/imdbpy2sql.py.gz.
I usually extract it and put it in /tmp/imdbpy2sql. I ran this:
$ /tmp/imdbpy2sql.py -d ~/imdb/lists -u postgres:///var/run/postgresql/imdb
It starts processing as normal. However at some point in the middle of
the actors, psycopg2 thows a DataError.
* FLUSHING SQLData...
SCANNING actor: Hartley, Jalaal
SCANNING actor: Harwood, Anthony (II)
* FLUSHING PersonsCache...
* FLUSHING SQLData...
SCANNING actor: Hatcher, Steve
SCANNING actor: Havers, Nigel
* FLUSHING SQLData...
SCANNING actor: Hayden, Luke
* FLUSHING CharactersCache...
Traceback (most recent call last):
File "/tmp/imdbpy2sql.py", line 2950, in <module>
run()
File "/tmp/imdbpy2sql.py", line 2811, in run
castLists(_charIDsList=characters_imdbIDs)
File "/tmp/imdbpy2sql.py", line 1575, in castLists
doCast(f, roleid, rolename)
File "/tmp/imdbpy2sql.py", line 1534, in doCast
cid = CACHE_CID.addUnique(role)
File "/tmp/imdbpy2sql.py", line 957, in addUnique
else: return self.add(key, miscData)
File "/tmp/imdbpy2sql.py", line 950, in add
self[key] = c
File "/tmp/imdbpy2sql.py", line 860, in __setitem__
self.flush()
File "/tmp/imdbpy2sql.py", line 883, in flush
self._toDB(quiet)
File "/tmp/imdbpy2sql.py", line 1185, in _toDB
CURS.executemany(self.sqlstr, self.converter(l))
psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0xc320
When I run /usr/share/doc/python-imdbpy/goodies/reduce.sh to get the
data size down a little the whole import works fine. So I'm guessing
there are some stray characters in the text somewhere that are not
being decoded properly to unicode, but I have no idea where to try to
fix it.
Regards
--
Tom
------------------------------------------------------------------------------
Xperia(TM) PLAY
It's a major breakthrough. An authentic gaming
smartphone on the nation's most reliable network.
And it wants your games.
http://p.sf.net/sfu/verizon-sfdev
_______________________________________________
Imdbpy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel