def _convert_to_utf8(self):
old_file_path =self.file_path
self.file_path = old_file_path.replace('.','-utf8.')
LOG.info('Converting to UTF8, new file: %s' %self.file_path)
cmd =' '.join(['iconv','-f','UTF-8','-t','UTF-8','-c',
old_file_path,'>',self.file_path])
LOG.info(cmd)
system(cmd)
On Tue, 16 Aug 2016 at 11:04 Mike Dewhirst <[email protected]
<mailto:[email protected]>> wrote:
If anyone can point me to the appropriate advice for resolving the
error below I would be most appreciative. Really very appreciative.
I think I understand Unicode in theory and have reread a lot of
articles including ...
* https://docs.python.org/3/library/codecs.html#encodings-and-unicode
*
https://pythonconquerstheuniverse.wordpress.com/2010/05/30/unicode-beginners-introduction-for-dummies-made-simple/
*
https://pythonconquerstheuniverse.wordpress.com/2010/06/04/unicode-for-dummies-just-use-utf-8/
* https://en.wikipedia.org/wiki/UTF-8
This is the error which has stumped me ...
(xxex3) C:\Users\mike\env\xxex3\ssds>python
substance/data_imports/map_csv.py
Traceback (most recent call last):
 File "substance/data_imports/map_csv.py", line 139, in <module>
   csvdata = CsvImport(csvfile, company, start, finish)
 File "substance/data_imports/map_csv.py", line 127, in __init__
   print("%s" % cells)
 File "C:\Users\mike\env\xxex3\lib\encodings\cp850.py", line 19,
in encode
   return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character
'\u2030' in position 7452: character maps to <undefined>
I have saved the csv file involved as utf-8 using LibreOffice 5 on
Windows 8.1. from the original Microsoft Excel spreadsheet.
This is in Python 3.5 on Windows but it also needs to run in
Python 2.7 on Ubuntu 14.04 server (no gui).
map_csv.py [1] is the beginning of a module I want to develop into
a generic data import facility. I'm starting with a specific csv
file I need to import (not mine and its contents are private) and
all it does at the moment is read in the file and print the lines
to stdout.
I have tried utf-8 encoding each line and that gets past the error
but just produces a set of chars a snippet of which below [2].
Decoding that as utf-8 reproduces the error as might be expected.
I have also tried decoding as utf-16 and encoding it as utf-8 but
that didn't work either.
Thanks for reading this far
Mike
[1] ...
from __future__ import unicode_literals
import os
class CsvImport(object):
   """ Imports a csv file and converts it into a list of lists """
   def __init__(self, csvfile, company, start, finish):
       self.company = company
       self.rows = list()
       with open(csvfile, "r") as csv:
           i = 0
           self.rows = csv.readlines()
           for line in self.rows:
               i += 1
               cells = list(line)
               if i >= start:
                   print("%s" % cells)
               if i > finish:
                   break
if __name__ == "__main__":
   company = "Calia Pty Ltd"
   dirname = "{0}/csv".format(company.split()[0].lower())
   filename = "{0}1.csv".format(company.split()[0].lower())
   start = 105
   finish = 404
   currdir =
os.path.realpath(os.path.dirname(__file__)).replace('\\', '/')
   csvfile = os.path.join(currdir, dirname, filename)
   csvdata = CsvImport(csvfile, company, start, finish)
[1] ... , 48, 48, 48, 48, 37, 44, 34, 34, 44, 44, 34, 34, 44, 34,
34, 44, 34, 65, 99, 117, 116, 101, 32, 72, 97, 122, 97, 114, 100,
32, 84, 111, 32, 84, 104, 101, 32, 65, 113, 117, 97, 116, 105, 99,
32, 69, 110, 118, 105, 114, 111, 110, 109, 101, 110, 116, 46, 34,
44, 44, 44, 44, 44, 44, 44, 48, 46, 48, 48, 48, 48, 48, 37, 44,
34, 34, 44, 44, 34, 34, 44, 34, 34, 44, 34, 34, 44, 34, 34, 44,
34, 67, 104, 114, 111, 110, 105, 99, 32, 72, 97, 122, 97, 114,
100, 32, 84, 111, 32, 84, 104, 101, 32, 65, 113, 117, 97, 116,
105, 99, 32, 69, 110, 118, 105, 114, 111, 110, 109, 101, 110,
116, 46, 34, 44, 50, 44, 34, 78, 47, 65, 34, 44, 34, 71, 72, 83,
48, 57, 34, 44, 34, 72, 52, 49, 49, 34, 44, 44, 44, 48, 46, 48,
48, 48, 48, 48, 37, 44, 34, 34, 44, 44, 34, 34, 44, 34, 34, 44,
34, 34, 44, 34, 34, 44, 34, 34, 44, 34, 72, 97, 122, 97, 114, 100,
111, 117, 115, 32, 84, 111, 32, 84, 104, 101, 32, 79, 122, 111,
110, 101, 32, 76, 97, 121, 101, 114, 46, 34, 44, 44, 44, 44, 44,
48, 46, 48, 48, 48, 48, 48, 37, 44, 34, 34, 44, 34, 34, 44, 34,
65, 100, 100, 105, 116, 105, 111, 110, 97, 108, 32, 78, 111, 110,
45, 71, 72, 83, 32, 72, 97, 122, 97, 114, 100, 32, 83, 116, 97,
116, 101, 109, 101, 110, 116, 34, 44, 34, 65, 85, 72, 48, 54, 54,
34, 44, 48, 46, 48, 48, 48, 48, 48, 37, 44, 34, 34, 10]
_______________________________________________
melbourne-pug mailing list
[email protected] <mailto:[email protected]>
https://mail.python.org/mailman/listinfo/melbourne-pug
_______________________________________________
melbourne-pug mailing list
[email protected]
https://mail.python.org/mailman/listinfo/melbourne-pug