Re: [Imdbpy-devel] [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding "UTF8"

Davide Alberani Wed, 13 Apr 2011 13:56:52 -0700

On Mon, Apr 11, 2011 at 18:35, darklow <dark...@gmail.com> wrote:
>
>   File "./imdbpy2sql.py", line 1194, in _toDB
>     CURS.executemany(self.sqlstr, self.converter(l))
> psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0xc320
> HINT:  This error can also happen if the byte sequence does not match the
> encoding expected by the server, which is controlled by "client_encoding".


Hi all,
I'm writing regarding the recent "0xc320" problem with IMDbPY.
The above notice is extremely interesting, and should be investigated:
how can it be that 0xc320 is not UTF8 encodable?
It should work; from the Python prompt:
  >>> unichr(0xc320).encode('utf8')
  '\xec\x8c\xa0'

Anyway, as a very fast and dirty fix (the main problem is probably some
crap in the data files), try this: after line 1181 of imdbpy2sql.py, add:
  k = k.replace('\xec\x8c\xa0', '')

So that the nearby lines will become:
            try:
                k = k.replace('\xec\x8c\xa0', '')
                t = analyze_name(k)
            except IMDbParserError:

Please be aware that this fix was not tested at all, but I'm
almost sure that, at the above point, 'k' is a string encoded in utf8.

Anyway, beside the "garbage theory", I have another idea
about the source of the error, but I have to verify it later...

Bye, and let me know if it works!

-- 
Davide Alberani <davide.alber...@gmail.com>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Re: [Imdbpy-devel] [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding "UTF8"

Reply via email to