Re: [melbourne-pug] Unicode for windows dummies

Mike Dewhirst Tue, 16 Aug 2016 17:25:34 -0700

On 16/08/2016 11:16 AM, David Micallef wrote:

Hi Mike
I use the linux program iconv before importing to csv for theseissues. Even though the csv's are supposed to be UTF-8 I find systemssometimes slip in something that is not. the -c in arguments ignoreserrors and moves right along removing problem parts from the output
The following function is from UTF-8 to UTF-8 seems pointless thoughit works because the -c


Thanks David. I'll need something like this for deploying on Linux ...

Cheers

mike


def _convert_to_utf8(self):
     old_file_path =self.file_path
     self.file_path = old_file_path.replace('.','-utf8.')
     LOG.info('Converting to UTF8, new file: %s' %self.file_path)
     cmd =' '.join(['iconv','-f','UTF-8','-t','UTF-8','-c',
                     old_file_path,'>',self.file_path])
     LOG.info(cmd)
     system(cmd)

On Tue, 16 Aug 2016 at 11:04 Mike Dewhirst <[email protected]<mailto:[email protected]>> wrote:


    If anyone can point me to the appropriate advice for resolving the
    error below I would be most appreciative. Really very appreciative.

    I think I understand Unicode in theory and have reread a lot of
    articles including ...

    * https://docs.python.org/3/library/codecs.html#encodings-and-unicode
    *
    
https://pythonconquerstheuniverse.wordpress.com/2010/05/30/unicode-beginners-introduction-for-dummies-made-simple/
    *
    
https://pythonconquerstheuniverse.wordpress.com/2010/06/04/unicode-for-dummies-just-use-utf-8/
    * https://en.wikipedia.org/wiki/UTF-8

    This is the error which has stumped me ...

    (xxex3) C:\Users\mike\env\xxex3\ssds>python
    substance/data_imports/map_csv.py

    Traceback (most recent call last):

    Â  File "substance/data_imports/map_csv.py", line 139, in <module>

    Â Â Â  csvdata = CsvImport(csvfile, company, start, finish)

    Â  File "substance/data_imports/map_csv.py", line 127, in __init__

    Â Â Â  print("%s" % cells)

    Â  File "C:\Users\mike\env\xxex3\lib\encodings\cp850.py", line 19,
    in encode

    Â Â Â  return codecs.charmap_encode(input,self.errors,encoding_map)[0]

    UnicodeEncodeError: 'charmap' codec can't encode character
    '\u2030' in position 7452: character maps to <undefined>


    I have saved the csv file involved as utf-8 using LibreOffice 5 on
    Windows 8.1. from the original Microsoft Excel spreadsheet.

    This is in Python 3.5 on Windows but it also needs to run in
    Python 2.7 on Ubuntu 14.04 server (no gui).

    map_csv.py [1] is the beginning of a module I want to develop into
    a generic data import facility. I'm starting with a specific csv
    file I need to import (not mine and its contents are private) and
    all it does at the moment is read in the file and print the lines
    to stdout.

    I have tried utf-8 encoding each line and that gets past the error
    but just produces a set of chars a snippet of which below [2].
    Decoding that as utf-8 reproduces the error as might be expected.
    I have also tried decoding as utf-16 and encoding it as utf-8 but
    that didn't work either.

    Thanks for reading this far

    Mike

    [1] ...

    from __future__ import unicode_literals

    import os

    class CsvImport(object):

    Â Â Â  """ Imports a csv file and converts it into a list of lists """

    Â Â Â  def __init__(self, csvfile, company, start, finish):

    Â Â Â Â Â Â Â  self.company = company

    Â Â Â Â Â Â Â  self.rows = list()

    Â Â Â Â Â Â Â  with open(csvfile, "r") as csv:

    Â Â Â Â Â Â Â Â Â Â Â  i = 0

    Â Â Â Â Â Â Â Â Â Â Â  self.rows = csv.readlines()

    Â Â Â Â Â Â Â Â Â Â Â  for line in self.rows:

    Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  i += 1

    Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  cells = list(line)

    Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  if i >= start:

    Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  print("%s" % cells)

    Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  if i > finish:

    Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  break

    if __name__ == "__main__":

    Â Â Â  company = "Calia Pty Ltd"

    Â Â Â  dirname = "{0}/csv".format(company.split()[0].lower())

    Â Â Â  filename = "{0}1.csv".format(company.split()[0].lower())

    Â Â Â  start = 105

    Â Â Â  finish = 404

    Â Â Â  currdir =
    os.path.realpath(os.path.dirname(__file__)).replace('\\', '/')

    Â Â Â  csvfile = os.path.join(currdir, dirname, filename)

    Â Â Â  csvdata = CsvImport(csvfile, company, start, finish)

    [1] ... , 48, 48, 48, 48, 37, 44, 34, 34, 44, 44, 34, 34, 44, 34,
    34, 44, 34, 65, 99, 117, 116, 101, 32, 72, 97, 122, 97, 114, 100,
    32, 84, 111, 32, 84, 104, 101, 32, 65, 113, 117, 97, 116, 105, 99,
    32, 69, 110, 118, 105, 114, 111, 110, 109, 101, 110, 116, 46, 34,
    44, 44, 44, 44, 44, 44, 44, 48, 46, 48, 48, 48, 48, 48, 37, 44,
    34, 34, 44, 44, 34, 34, 44, 34, 34, 44, 34, 34, 44, 34, 34, 44,
    34, 67, 104, 114, 111, 110, 105, 99, 32, 72, 97, 122, 97, 114,
    100, 32, 84, 111, 32, 84, 104, 101, 32, 65, 113, 117, 97, 116,
    105, 99, 32, 69, 110, 118, 105, 114, 111,Â  110, 109, 101, 110,
    116, 46, 34, 44, 50, 44, 34, 78, 47, 65, 34, 44, 34, 71, 72, 83,
    48, 57, 34, 44, 34, 72, 52, 49, 49, 34, 44, 44, 44, 48, 46, 48,
    48, 48, 48, 48, 37, 44, 34, 34, 44, 44, 34, 34, 44, 34, 34, 44,
    34, 34, 44, 34, 34, 44, 34, 34, 44, 34, 72, 97, 122, 97, 114, 100,
    111, 117, 115, 32, 84, 111, 32, 84, 104, 101, 32, 79, 122, 111,
    110, 101, 32, 76, 97, 121, 101, 114, 46, 34, 44, 44, 44, 44, 44,
    48, 46, 48, 48, 48, 48, 48, 37, 44, 34, 34, 44, 34, 34, 44, 34,
    65, 100, 100, 105, 116, 105, 111, 110, 97, 108, 32, 78, 111, 110,
    45, 71, 72, 83, 32, 72, 97, 122, 97, 114, 100, 32, 83, 116, 97,
    116, 101, 109, 101, 110, 116, 34, 44, 34, 65, 85, 72, 48, 54, 54,
    34, 44, 48, 46, 48, 48, 48, 48, 48, 37, 44, 34, 34, 10]




    _______________________________________________
    melbourne-pug mailing list
    [email protected] <mailto:[email protected]>
    https://mail.python.org/mailman/listinfo/melbourne-pug



_______________________________________________
melbourne-pug mailing list
[email protected]
https://mail.python.org/mailman/listinfo/melbourne-pug


_______________________________________________
melbourne-pug mailing list
[email protected]
https://mail.python.org/mailman/listinfo/melbourne-pug

Re: [melbourne-pug] Unicode for windows dummies

Reply via email to