Oh well ... maybe it isn't Python's fault. I just looked at the data
input file and found the ³ character in all places had been turned into
a box. When I edited the boxes back into ³ it all went well.
I used Filezilla to get the input files across so I'll focus on that next.
Sorry to interrupt your long weekend.
Cheers
Mike
On 8/03/2020 5:30 pm, Mike Dewhirst wrote:
I'm now exclusively Python 3.6+ thank heavens but ...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb3 in position
6500: invalid start byte
It just so happens that is the superscript 3 character. It also
happens that superscript 3 displays correctly and works properly on
Windows 10 but causes the above error on Ubuntu 18.04. I'm not paid
enough to understand why - hence this email if anyone can help.
My current pain is because I'm pumping data into a database
(PostgreSQL) which needs such measures as 5µg/m³ and Python hates me.
I think there is a valid argument for the Python utf-8 codec to
"special-case" subscript and superscript numeral unicode collisions
with ASCII or whatever Windows 10 uses. That would cover maths and
chemistry both. And save me a lot of pain.
Thanks for any sympathy and many, many thanks for help on getting past
this.
Cheers
Mike
PS: I use superscript and subscript numbers all the time because I'm
involved with chemical data. Here is how I usually deal with it ...
from django.utils.encoding import smart_text
from django.utils.safestring import mark_safe
def subscript_to_ascii(raw=None):
"""Swap subscript unicode chars into ordinary numbers for
synonym searches.
"""
formula = ""
clear = True
if raw is not None:
# for char in str(raw):
for char in raw:
if char == "[":
clear = False # permits [1] footnote references
elif char == "]":
clear = True
if clear:
if char == "\u2082":
char = "2"
elif char == "\u2083":
char = "3"
elif char == "\u2084":
char = "4"
elif char == "\u2085":
char = "5"
elif char == "\u2086":
char = "6"
elif char == "\u2087":
char = "7"
elif char == "\u2088":
char = "8"
elif char == "\u2089":
char = "9"
elif char == "\u2081":
char = "1"
elif char == "\u2080":
char = "0"
formula += char
return smart_text(formula)
def subscript(raw=None):
"""Swap ordinary numbers for subscript unicode chars."""
formula = ""
clear = True
if raw is not None:
for char in raw:
if char == "[":
clear = False # permits [1] footnote references
elif char == "]":
clear = True
if clear:
if char == "2":
char = "\u2082"
elif char == "3":
char = "\u2083"
elif char == "4":
char = "\u2084"
elif char == "5":
char = "\u2085"
elif char == "6":
char = "\u2086"
elif char == "7":
char = "\u2087"
elif char == "8":
char = "\u2088"
elif char == "9":
char = "\u2089"
elif char == "1":
char = "\u2081"
elif char == "0":
char = "\u2080"
formula += char
return smart_text(formula.encode("utf8"))
lc50 = subscript(LC50)
ld50 = subscript(LD50)
def safesubscript(raw=None, ascii=False):
"""Uses marksafe to subscript instead of unicode chars. This looks
better on screen but cannot be used in places.
"""
formula = ""
clear = True
if raw is not None:
for char in raw:
if char == "[":
# don"t process any more digits just add to formula
clear = False # permits [1] footnote references
elif char == "]":
# start processing again
clear = True
if clear:
if char == "2" or char == "\u2082":
char = "<sub>2</sub>"
elif char == "3" or char == "\u2083":
char = "<sub>3</sub>"
elif char == "4" or char == "\u2084":
char = "<sub>4</sub>"
elif char == "5" or char == "\u2085":
char = "<sub>5</sub>"
elif char == "6" or char == "\u2086":
char = "<sub>6</sub>"
elif char == "7" or char == "\u2087":
char = "<sub>7</sub>"
elif char == "8" or char == "\u2088":
char = "<sub>8</sub>"
elif char == "9" or char == "\u2089":
char = "<sub>9</sub>"
elif char == "1" or char == "\u2081":
char = "<sub>1</sub>"
elif char == "0" or char == "\u2080":
char = "<sub>0</sub>"
formula += char
if ascii:
formula = formula.replace("<sub>", "").replace("</sub", "")
return mark_safe(smart_text(formula))
_______________________________________________
melbourne-pug mailing list
melbourne-pug@python.org
https://mail.python.org/mailman/listinfo/melbourne-pug
_______________________________________________
melbourne-pug mailing list
melbourne-pug@python.org
https://mail.python.org/mailman/listinfo/melbourne-pug