On Sun, May 14, 2017 at 10:36 PM, Philip Earvolino <pearvol...@gmail.com> wrote:
> Hello.  I am now working with the mySQL db and the titles do not, apparently, 
> have the right encoding (i.e., certain characters do not appear properly).  
> The encoding is cp1252 West European (latin1) and the collation is latin1_bin 
> which are what is specified in the flat file IMDB export and, I think(?), in 
> the imdb sql creation script.

IMDbPY takes the iso-8859-1 plain text files and convert them to utf-8.

If I remember correctly, we don't force the db collections to be utf-8 - and we
didn't document it :-/ - so if you've created your db and tables as cp1252, it's
normal that the data seems messy.

> Any suggestions?

I don't know what happens if you change your collation encoding to
utf8_unicode_ci (or something like that).
If MySQL doesn't touch the data, great, otherwise you will have an
even bigger mess, I fear.


Davide Alberani <davide.alber...@gmail.com>  [PGP KeyID: 0x3845A3D4AC9B61AD]

Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Imdbpy-help mailing list

Reply via email to