On 16/08/2016 11:16 AM, David Micallef wrote:
Hi Mike

I use the linux program iconv before importing to csv for these issues. Even though the csv's are supposed to be UTF-8 I find systems sometimes slip in something that is not. the -c in arguments ignores errors and moves right along removing problem parts from the output

The following function is from UTF-8 to UTF-8 seems pointless though it works because the -c

Thanks David. I'll need something like this for deploying on Linux ...

Cheers

mike

def _convert_to_utf8(self):
     old_file_path =self.file_path
     self.file_path = old_file_path.replace('.','-utf8.')
     LOG.info('Converting to UTF8, new file: %s' %self.file_path)
     cmd =' '.join(['iconv','-f','UTF-8','-t','UTF-8','-c',
                     old_file_path,'>',self.file_path])
     LOG.info(cmd)
     system(cmd)

On Tue, 16 Aug 2016 at 11:04 Mike Dewhirst <[email protected] <mailto:[email protected]>> wrote:

    If anyone can point me to the appropriate advice for resolving the
    error below I would be most appreciative. Really very appreciative.

    I think I understand Unicode in theory and have reread a lot of
    articles including ...

    * https://docs.python.org/3/library/codecs.html#encodings-and-unicode
    *
    
https://pythonconquerstheuniverse.wordpress.com/2010/05/30/unicode-beginners-introduction-for-dummies-made-simple/
    *
    
https://pythonconquerstheuniverse.wordpress.com/2010/06/04/unicode-for-dummies-just-use-utf-8/
    * https://en.wikipedia.org/wiki/UTF-8

    This is the error which has stumped me ...

    (xxex3) C:\Users\mike\env\xxex3\ssds>python
    substance/data_imports/map_csv.py

    Traceback (most recent call last):

    Â  File "substance/data_imports/map_csv.py", line 139, in <module>

    Â Â Â  csvdata = CsvImport(csvfile, company, start, finish)

    Â  File "substance/data_imports/map_csv.py", line 127, in __init__

    Â Â Â  print("%s" % cells)

    Â  File "C:\Users\mike\env\xxex3\lib\encodings\cp850.py", line 19,
    in encode

    Â Â Â  return codecs.charmap_encode(input,self.errors,encoding_map)[0]

    UnicodeEncodeError: 'charmap' codec can't encode character
    '\u2030' in position 7452: character maps to <undefined>


    I have saved the csv file involved as utf-8 using LibreOffice 5 on
    Windows 8.1. from the original Microsoft Excel spreadsheet.

    This is in Python 3.5 on Windows but it also needs to run in
    Python 2.7 on Ubuntu 14.04 server (no gui).

    map_csv.py [1] is the beginning of a module I want to develop into
    a generic data import facility. I'm starting with a specific csv
    file I need to import (not mine and its contents are private) and
    all it does at the moment is read in the file and print the lines
    to stdout.

    I have tried utf-8 encoding each line and that gets past the error
    but just produces a set of chars a snippet of which below [2].
    Decoding that as utf-8 reproduces the error as might be expected.
    I have also tried decoding as utf-16 and encoding it as utf-8 but
    that didn't work either.

    Thanks for reading this far

    Mike

    [1] ...

    from __future__ import unicode_literals

    import os

    class CsvImport(object):

    Â Â Â  """ Imports a csv file and converts it into a list of lists """

    Â Â Â  def __init__(self, csvfile, company, start, finish):

    Â Â Â Â Â Â Â  self.company = company

    Â Â Â Â Â Â Â  self.rows = list()

    Â Â Â Â Â Â Â  with open(csvfile, "r") as csv:

    Â Â Â Â Â Â Â Â Â Â Â  i = 0

    Â Â Â Â Â Â Â Â Â Â Â  self.rows = csv.readlines()

    Â Â Â Â Â Â Â Â Â Â Â  for line in self.rows:

    Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  i += 1

    Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  cells = list(line)

    Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  if i >= start:

    Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  print("%s" % cells)

    Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  if i > finish:

    Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  break

    if __name__ == "__main__":

    Â Â Â  company = "Calia Pty Ltd"

    Â Â Â  dirname = "{0}/csv".format(company.split()[0].lower())

    Â Â Â  filename = "{0}1.csv".format(company.split()[0].lower())

    Â Â Â  start = 105

    Â Â Â  finish = 404

    Â Â Â  currdir =
    os.path.realpath(os.path.dirname(__file__)).replace('\\', '/')

    Â Â Â  csvfile = os.path.join(currdir, dirname, filename)

    Â Â Â  csvdata = CsvImport(csvfile, company, start, finish)

    [1] ... , 48, 48, 48, 48, 37, 44, 34, 34, 44, 44, 34, 34, 44, 34,
    34, 44, 34, 65, 99, 117, 116, 101, 32, 72, 97, 122, 97, 114, 100,
    32, 84, 111, 32, 84, 104, 101, 32, 65, 113, 117, 97, 116, 105, 99,
    32, 69, 110, 118, 105, 114, 111, 110, 109, 101, 110, 116, 46, 34,
    44, 44, 44, 44, 44, 44, 44, 48, 46, 48, 48, 48, 48, 48, 37, 44,
    34, 34, 44, 44, 34, 34, 44, 34, 34, 44, 34, 34, 44, 34, 34, 44,
    34, 67, 104, 114, 111, 110, 105, 99, 32, 72, 97, 122, 97, 114,
    100, 32, 84, 111, 32, 84, 104, 101, 32, 65, 113, 117, 97, 116,
    105, 99, 32, 69, 110, 118, 105, 114, 111,  110, 109, 101, 110,
    116, 46, 34, 44, 50, 44, 34, 78, 47, 65, 34, 44, 34, 71, 72, 83,
    48, 57, 34, 44, 34, 72, 52, 49, 49, 34, 44, 44, 44, 48, 46, 48,
    48, 48, 48, 48, 37, 44, 34, 34, 44, 44, 34, 34, 44, 34, 34, 44,
    34, 34, 44, 34, 34, 44, 34, 34, 44, 34, 72, 97, 122, 97, 114, 100,
    111, 117, 115, 32, 84, 111, 32, 84, 104, 101, 32, 79, 122, 111,
    110, 101, 32, 76, 97, 121, 101, 114, 46, 34, 44, 44, 44, 44, 44,
    48, 46, 48, 48, 48, 48, 48, 37, 44, 34, 34, 44, 34, 34, 44, 34,
    65, 100, 100, 105, 116, 105, 111, 110, 97, 108, 32, 78, 111, 110,
    45, 71, 72, 83, 32, 72, 97, 122, 97, 114, 100, 32, 83, 116, 97,
    116, 101, 109, 101, 110, 116, 34, 44, 34, 65, 85, 72, 48, 54, 54,
    34, 44, 48, 46, 48, 48, 48, 48, 48, 37, 44, 34, 34, 10]




    _______________________________________________
    melbourne-pug mailing list
    [email protected] <mailto:[email protected]>
    https://mail.python.org/mailman/listinfo/melbourne-pug



_______________________________________________
melbourne-pug mailing list
[email protected]
https://mail.python.org/mailman/listinfo/melbourne-pug

_______________________________________________
melbourne-pug mailing list
[email protected]
https://mail.python.org/mailman/listinfo/melbourne-pug

Reply via email to