Hi Mike

I use the linux program iconv before importing to csv for these issues.
Even though the csv's are supposed to be UTF-8 I find systems sometimes
slip in something that is not. the -c in arguments ignores errors and moves
right along removing problem parts from the output

The following function is from UTF-8 to UTF-8 seems pointless though it
works because the -c

def _convert_to_utf8(self):
    old_file_path = self.file_path
    self.file_path = old_file_path.replace('.', '-utf8.')
    LOG.info('Converting to UTF8, new file: %s' % self.file_path)
    cmd = ' '.join(['iconv', '-f', 'UTF-8', '-t', 'UTF-8', '-c',
                    old_file_path, '>', self.file_path])
    LOG.info(cmd)
    system(cmd)


On Tue, 16 Aug 2016 at 11:04 Mike Dewhirst <[email protected]> wrote:

> If anyone can point me to the appropriate advice for resolving the error
> below I would be most appreciative. Really very appreciative.
>
> I think I understand Unicode in theory and have reread a lot of articles
> including ...
>
> * https://docs.python.org/3/library/codecs.html#encodings-and-unicode
> *
> https://pythonconquerstheuniverse.wordpress.com/2010/05/30/unicode-beginners-introduction-for-dummies-made-simple/
> *
> https://pythonconquerstheuniverse.wordpress.com/2010/06/04/unicode-for-dummies-just-use-utf-8/
> * https://en.wikipedia.org/wiki/UTF-8
>
> This is the error which has stumped me ...
>
> (xxex3) C:\Users\mike\env\xxex3\ssds>python substance/data_imports/map_csv.py
>
> Traceback (most recent call last):
>
>   File "substance/data_imports/map_csv.py", line 139, in <module>
>
>     csvdata = CsvImport(csvfile, company, start, finish)
>
>   File "substance/data_imports/map_csv.py", line 127, in __init__
>
>     print("%s" % cells)
>
>   File "C:\Users\mike\env\xxex3\lib\encodings\cp850.py", line 19, in encode
>
>     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
>
> UnicodeEncodeError: 'charmap' codec can't encode character '\u2030' in 
> position 7452: character maps to <undefined>
>
>
> I have saved the csv file involved as utf-8 using LibreOffice 5 on Windows
> 8.1. from the original Microsoft Excel spreadsheet.
>
> This is in Python 3.5 on Windows but it also needs to run in Python 2.7 on
> Ubuntu 14.04 server (no gui).
>
> map_csv.py [1] is the beginning of a module I want to develop into a
> generic data import facility. I'm starting with a specific csv file I need
> to import (not mine and its contents are private) and all it does at the
> moment is read in the file and print the lines to stdout.
>
> I have tried utf-8 encoding each line and that gets past the error but
> just produces a set of chars a snippet of which below [2]. Decoding that as
> utf-8 reproduces the error as might be expected. I have also tried decoding
> as utf-16 and encoding it as utf-8 but that didn't work either.
>
> Thanks for reading this far
>
> Mike
>
> [1] ...
>
> from __future__ import unicode_literals
>
> import os
>
> class CsvImport(object):
>
>     """ Imports a csv file and converts it into a list of lists """
>
>     def __init__(self, csvfile, company, start, finish):
>
>         self.company = company
>
>         self.rows = list()
>
>         with open(csvfile, "r") as csv:
>
>             i = 0
>
>             self.rows = csv.readlines()
>
>             for line in self.rows:
>
>                 i += 1
>
>                 cells = list(line)
>
>                 if i >= start:
>
>                     print("%s" % cells)
>
>                 if i > finish:
>
>                     break
>
> if __name__ == "__main__":
>
>     company = "Calia Pty Ltd"
>
>     dirname = "{0}/csv".format(company.split()[0].lower())
>
>     filename = "{0}1.csv".format(company.split()[0].lower())
>
>     start = 105
>
>     finish = 404
>
>     currdir = os.path.realpath(os.path.dirname(__file__)).replace('\\', '/')
>
>     csvfile = os.path.join(currdir, dirname, filename)
>
>     csvdata = CsvImport(csvfile, company, start, finish)
>
> [1] ... , 48, 48, 48, 48, 37, 44, 34, 34, 44, 44, 34, 34, 44, 34, 34, 44,
> 34, 65, 99, 117, 116, 101, 32, 72, 97, 122, 97, 114, 100, 32, 84, 111, 32,
> 84, 104, 101, 32, 65, 113, 117, 97, 116, 105, 99, 32, 69, 110, 118, 105,
> 114, 111, 110, 109, 101, 110, 116, 46, 34, 44, 44, 44, 44, 44, 44, 44, 48,
> 46, 48, 48, 48, 48, 48, 37, 44, 34, 34, 44, 44, 34, 34, 44, 34, 34, 44, 34,
> 34, 44, 34, 34, 44, 34, 67, 104, 114, 111, 110, 105, 99, 32, 72, 97, 122,
> 97, 114, 100, 32, 84, 111, 32, 84, 104, 101, 32, 65, 113, 117, 97, 116,
> 105, 99, 32, 69, 110, 118, 105, 114, 111,  110, 109, 101, 110, 116, 46, 34,
> 44, 50, 44, 34, 78, 47, 65, 34, 44, 34, 71, 72, 83, 48, 57, 34, 44, 34, 72,
> 52, 49, 49, 34, 44, 44, 44, 48, 46, 48, 48, 48, 48, 48, 37, 44, 34, 34, 44,
> 44, 34, 34, 44, 34, 34, 44, 34, 34, 44, 34, 34, 44, 34, 34, 44, 34, 72, 97,
> 122, 97, 114, 100, 111, 117, 115, 32, 84, 111, 32, 84, 104, 101, 32, 79,
> 122, 111, 110, 101, 32, 76, 97, 121, 101, 114, 46, 34, 44, 44, 44, 44, 44,
> 48, 46, 48, 48, 48, 48, 48, 37, 44, 34, 34, 44, 34, 34, 44, 34, 65, 100,
> 100, 105, 116, 105, 111, 110, 97, 108, 32, 78, 111, 110, 45, 71, 72, 83,
> 32, 72, 97, 122, 97, 114, 100, 32, 83, 116, 97, 116, 101, 109, 101, 110,
> 116, 34, 44, 34, 65, 85, 72, 48, 54, 54, 34, 44, 48, 46, 48, 48, 48, 48,
> 48, 37, 44, 34, 34, 10]
>
>
>
>
> _______________________________________________
> melbourne-pug mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/melbourne-pug
>
_______________________________________________
melbourne-pug mailing list
[email protected]
https://mail.python.org/mailman/listinfo/melbourne-pug

Reply via email to