Hi Ambrose, Can you specify the complete command line and the database you are using?
Yes, I fear you have lost 1000 entries for each error. I'm not sure about the root cause of the problem; maybe you need to specify some additional parameter to the database URI? See https://imdbpy.readthedocs.io/en/latest/usage/s3.html for an example. Another obvious source of information is the logs of the database. Anything useful there? Hope this helps, On Thu, Sep 17, 2020 at 12:21 PM Ambrose Chapel <chapel.ambr...@gmail.com> wrote: > > I'm running the s32imdbpy.py script to import the gz files into my SQL > database. > > I'm seeing this error a lot, example, when processing name.basics.tsv.gz: > > ERROR:<username>:error processing data: 10000 entries lost: 'charmap' codec > can't encode characters in position 0-9: character maps to <undefined> > > > My database table is set to charset utf8_unicode_ci as per instructions. > > I guess my obvious question is how can I prevent this, but also, have I > really lost 1,000 database entries? Or have I got those 1,000 database > entries in my database but with some problem unicode characters missing, and > the message is misleading? > > TIA > _______________________________________________ > Imdbpy-help mailing list > Imdbpy-help@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/imdbpy-help -- Davide Alberani <davide.alber...@gmail.com> [PGP KeyID: 0x3845A3D4AC9B61AD] http://www.mimante.net/ _______________________________________________ Imdbpy-help mailing list Imdbpy-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-help