On Mon, Apr 11, 2011 at 18:35, darklow <dark...@gmail.com> wrote: > > File "./imdbpy2sql.py", line 1194, in _toDB > CURS.executemany(self.sqlstr, self.converter(l)) > psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0xc320 > HINT: This error can also happen if the byte sequence does not match the > encoding expected by the server, which is controlled by "client_encoding".
Hi all, I'm writing regarding the recent "0xc320" problem with IMDbPY. The above notice is extremely interesting, and should be investigated: how can it be that 0xc320 is not UTF8 encodable? It should work; from the Python prompt: >>> unichr(0xc320).encode('utf8') '\xec\x8c\xa0' Anyway, as a very fast and dirty fix (the main problem is probably some crap in the data files), try this: after line 1181 of imdbpy2sql.py, add: k = k.replace('\xec\x8c\xa0', '') So that the nearby lines will become: try: k = k.replace('\xec\x8c\xa0', '') t = analyze_name(k) except IMDbParserError: Please be aware that this fix was not tested at all, but I'm almost sure that, at the above point, 'k' is a string encoded in utf8. Anyway, beside the "garbage theory", I have another idea about the source of the error, but I have to verify it later... Bye, and let me know if it works! -- Davide Alberani <davide.alber...@gmail.com> [PGP KeyID: 0x465BFD47] http://www.mimante.net/ ------------------------------------------------------------------------------ Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel