Re: [melbourne-pug] Unicode for windows dummies

Anthony Briggs Mon, 15 Aug 2016 21:25:43 -0700

Hi Mike,

I was just trying to solve a similar problem at the PyconAU sprints :)


The error is that there are some things / Unicode strings which don't
translate to Windows 'charmap' characters, and can't be printed to the
terminal. You can replicate it with this code:

print("M├┐ h├┤v├¿r├ºr├áft ├«├ƒ f├╗┼él ├Âf ├®├¬l┼ø".encode("cp1252"))


The solution depends on what you're trying to do:

   - If it's a one-off thing, you can find and eliminate the utf-8
   characters in the csv file.
   - Failing that, you can encode to 'cp1252' and replace or ignore the
   unicode characters that don't map.
   https://docs.python.org/3/howto/unicode.html#the-string-type has more
   details, but something like line.decode("cp1252", "replace") on the
   lines that you're reading from the csv file should work (ie. convert to
   windows encoding
   - Thirdly, there's a package called unicodecsv, which is a drop-in utf-8
   version of the csv module, and might fix your unicode errors.

Anthony



On 16 August 2016 at 11:01, Mike Dewhirst <[email protected]> wrote:

> If anyone can point me to the appropriate advice for resolving the error
> below I would be most appreciative. Really very appreciative.
>
> I think I understand Unicode in theory and have reread a lot of articles
> including ...
>
> * https://docs.python.org/3/library/codecs.html#encodings-and-unicode
> * https://pythonconquerstheuniverse.wordpress.com/2010/05/30/
> unicode-beginners-introduction-for-dummies-made-simple/
> * https://pythonconquerstheuniverse.wordpress.com/2010/06/04/
> unicode-for-dummies-just-use-utf-8/
> * https://en.wikipedia.org/wiki/UTF-8
>
> This is the error which has stumped me ...
>
> (xxex3) C:\Users\mike\env\xxex3\ssds>python substance/data_imports/map_csv.py
>
> Traceback (most recent call last):
>
>   File "substance/data_imports/map_csv.py", line 139, in <module>
>
>     csvdata = CsvImport(csvfile, company, start, finish)
>
>   File "substance/data_imports/map_csv.py", line 127, in __init__
>
>     print("%s" % cells)
>
>   File "C:\Users\mike\env\xxex3\lib\encodings\cp850.py", line 19, in encode
>
>     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
>
> UnicodeEncodeError: 'charmap' codec can't encode character '\u2030' in 
> position 7452: character maps to <undefined>
>
>
> I have saved the csv file involved as utf-8 using LibreOffice 5 on Windows
> 8.1. from the original Microsoft Excel spreadsheet.
>
> This is in Python 3.5 on Windows but it also needs to run in Python 2.7 on
> Ubuntu 14.04 server (no gui).
>
> map_csv.py [1] is the beginning of a module I want to develop into a
> generic data import facility. I'm starting with a specific csv file I need
> to import (not mine and its contents are private) and all it does at the
> moment is read in the file and print the lines to stdout.
>
> I have tried utf-8 encoding each line and that gets past the error but
> just produces a set of chars a snippet of which below [2]. Decoding that as
> utf-8 reproduces the error as might be expected. I have also tried decoding
> as utf-16 and encoding it as utf-8 but that didn't work either.
>
> Thanks for reading this far
>
> Mike
>
> [1] ...
>
> from __future__ import unicode_literals
>
> import os
>
> class CsvImport(object):
>
>     """ Imports a csv file and converts it into a list of lists """
>
>     def __init__(self, csvfile, company, start, finish):
>
>         self.company = company
>
>         self.rows = list()
>
>         with open(csvfile, "r") as csv:
>
>             i = 0
>
>             self.rows = csv.readlines()
>
>             for line in self.rows:
>
>                 i += 1
>
>                 cells = list(line)
>
>                 if i >= start:
>
>                     print("%s" % cells)
>
>                 if i > finish:
>
>                     break
>
> if __name__ == "__main__":
>
>     company = "Calia Pty Ltd"
>
>     dirname = "{0}/csv".format(company.split()[0].lower())
>
>     filename = "{0}1.csv".format(company.split()[0].lower())
>
>     start = 105
>
>     finish = 404
>
>     currdir = os.path.realpath(os.path.dirname(__file__)).replace('\\', '/')
>
>     csvfile = os.path.join(currdir, dirname, filename)
>
>     csvdata = CsvImport(csvfile, company, start, finish)
>
> [1] ... , 48, 48, 48, 48, 37, 44, 34, 34, 44, 44, 34, 34, 44, 34, 34, 44,
> 34, 65, 99, 117, 116, 101, 32, 72, 97, 122, 97, 114, 100, 32, 84, 111, 32,
> 84, 104, 101, 32, 65, 113, 117, 97, 116, 105, 99, 32, 69, 110, 118, 105,
> 114, 111, 110, 109, 101, 110, 116, 46, 34, 44, 44, 44, 44, 44, 44, 44, 48,
> 46, 48, 48, 48, 48, 48, 37, 44, 34, 34, 44, 44, 34, 34, 44, 34, 34, 44, 34,
> 34, 44, 34, 34, 44, 34, 67, 104, 114, 111, 110, 105, 99, 32, 72, 97, 122,
> 97, 114, 100, 32, 84, 111, 32, 84, 104, 101, 32, 65, 113, 117, 97, 116,
> 105, 99, 32, 69, 110, 118, 105, 114, 111,  110, 109, 101, 110, 116, 46, 34,
> 44, 50, 44, 34, 78, 47, 65, 34, 44, 34, 71, 72, 83, 48, 57, 34, 44, 34, 72,
> 52, 49, 49, 34, 44, 44, 44, 48, 46, 48, 48, 48, 48, 48, 37, 44, 34, 34, 44,
> 44, 34, 34, 44, 34, 34, 44, 34, 34, 44, 34, 34, 44, 34, 34, 44, 34, 72, 97,
> 122, 97, 114, 100, 111, 117, 115, 32, 84, 111, 32, 84, 104, 101, 32, 79,
> 122, 111, 110, 101, 32, 76, 97, 121, 101, 114, 46, 34, 44, 44, 44, 44, 44,
> 48, 46, 48, 48, 48, 48, 48, 37, 44, 34, 34, 44, 34, 34, 44, 34, 65, 100,
> 100, 105, 116, 105, 111, 110, 97, 108, 32, 78, 111, 110, 45, 71, 72, 83,
> 32, 72, 97, 122, 97, 114, 100, 32, 83, 116, 97, 116, 101, 109, 101, 110,
> 116, 34, 44, 34, 65, 85, 72, 48, 54, 54, 34, 44, 48, 46, 48, 48, 48, 48,
> 48, 37, 44, 34, 34, 10]
>
>
>
>
>
> _______________________________________________
> melbourne-pug mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/melbourne-pug
>
>

_______________________________________________
melbourne-pug mailing list
[email protected]
https://mail.python.org/mailman/listinfo/melbourne-pug

Re: [melbourne-pug] Unicode for windows dummies

Reply via email to