DavidThanks for responding. And sorry to all - I should have reported that 
Javier solved my problem.Windows 10 has some sort of gotcha with utf16 which it 
automatically imposes if you don't watch closely. Just transferring a utf8 file 
using ftp does it. The solution was to edit the file after transfer to Linux 
and save again as utf8.Microsoft is the pain, not superscripts.CheersMikePs. 
This android phone top-posts automatically and I can't be bothered figuring out 
how to defeat that. 
-------- Original message --------From: David Micallef 
<d...@montagesoftware.com.au> Date: 8/3/20  19:31  (GMT+10:00) To: Melbourne 
Python Users Group <melbourne-pug@python.org> Subject: Re: [melbourne-pug] 
Superscript chars are a pain Hi MikeI could be missing something though is 
there an opportunity to set the encoding when your reading the file. The 
default is utf-8 though you can set enociding to be the actual encoding of the 
file that you are reading. These file could be ISO-8859-1 or another 
variant.CheersDaveOn Sun, 8 Mar 2020 at 17:45, Mike Dewhirst 
<mi...@dewhirst.com.au> wrote:
  
    
  
  
    Oh well ... maybe it isn't Python's
      fault. I just looked at the data input file and found the ³
      character in all places had been turned into a box. When I edited
      the boxes back into ³ it all went well.
      
      I used Filezilla to get the input files across so I'll focus on
      that next.
      
      Sorry to interrupt your long weekend.
      
      Cheers
      
      Mike
      
      
      On 8/03/2020 5:30 pm, Mike Dewhirst wrote:
    
    
      
      I'm now exclusively Python 3.6+ thank heavens but
        ...
        
        UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb3 in
        position 6500: invalid start byte
        
        It just so happens that is the superscript 3 character.  It also
        happens that superscript 3 displays correctly and works properly
        on Windows 10 but causes the above error on Ubuntu 18.04. I'm
        not paid enough to understand why - hence this email if anyone
        can help.
        
        My current pain is because I'm pumping data into a database
        (PostgreSQL) which needs such measures as 5µg/m³ and Python
        hates me.
        
        I think there is a valid argument for the Python utf-8 codec to
        "special-case" subscript and superscript numeral unicode
        collisions with ASCII or whatever Windows 10 uses. That would
        cover maths and chemistry both. And save me a lot of pain.
        
        Thanks for any sympathy and many, many thanks for help on
        getting past this.
        
        Cheers
        
        Mike
        
        PS: I use superscript and subscript numbers all the time because
        I'm involved with chemical data. Here is how I usually deal with
        it ...
        
        
       
        from django.utils.encoding import smart_text
          from django.utils.safestring import mark_safe
          
          
          def subscript_to_ascii(raw=None):
              """Swap subscript unicode chars into ordinary numbers for
              synonym searches.
              """
              formula = ""
              clear = True
              if raw is not None:
                  # for char in str(raw):
                  for char in raw:
                      if char == "[":
                          clear = False  # permits [1] footnote
          references
                      elif char == "]":
                          clear = True
                      if clear:
                          if char == "\u2082":
                              char = "2"
                          elif char == "\u2083":
                              char = "3"
                          elif char == "\u2084":
                              char = "4"
                          elif char == "\u2085":
                              char = "5"
                          elif char == "\u2086":
                              char = "6"
                          elif char == "\u2087":
                              char = "7"
                          elif char == "\u2088":
                              char = "8"
                          elif char == "\u2089":
                              char = "9"
                          elif char == "\u2081":
                              char = "1"
                          elif char == "\u2080":
                              char = "0"
                      formula += char
              return smart_text(formula)
          
          
          def subscript(raw=None):
              """Swap ordinary numbers for subscript unicode chars."""
              formula = ""
              clear = True
              if raw is not None:
                  for char in raw:
                      if char == "[":
                          clear = False  # permits [1] footnote
          references
                      elif char == "]":
                          clear = True
                      if clear:
                          if char == "2":
                              char = "\u2082"
                          elif char == "3":
                              char = "\u2083"
                          elif char == "4":
                              char = "\u2084"
                          elif char == "5":
                              char = "\u2085"
                          elif char == "6":
                              char = "\u2086"
                          elif char == "7":
                              char = "\u2087"
                          elif char == "8":
                              char = "\u2088"
                          elif char == "9":
                              char = "\u2089"
                          elif char == "1":
                              char = "\u2081"
                          elif char == "0":
                              char = "\u2080"
                      formula += char
              return smart_text(formula.encode("utf8"))
          
          
          lc50 = subscript(LC50)
          ld50 = subscript(LD50)
          
          
          def safesubscript(raw=None, ascii=False):
              """Uses marksafe to subscript instead of unicode chars.
          This looks
              better on screen but cannot be used in places.
              """
              formula = ""
              clear = True
              if raw is not None:
                  for char in raw:
                      if char == "[":
                          # don"t process any more digits just add to
          formula
                          clear = False  # permits [1] footnote
          references
                      elif char == "]":
                          # start processing again
                          clear = True
                      if clear:
                          if char == "2" or char == "\u2082":
                              char = "<sub>2</sub>"
                          elif char == "3" or char == "\u2083":
                              char = "<sub>3</sub>"
                          elif char == "4" or char == "\u2084":
                              char = "<sub>4</sub>"
                          elif char == "5" or char == "\u2085":
                              char = "<sub>5</sub>"
                          elif char == "6" or char == "\u2086":
                              char = "<sub>6</sub>"
                          elif char == "7" or char == "\u2087":
                              char = "<sub>7</sub>"
                          elif char == "8" or char == "\u2088":
                              char = "<sub>8</sub>"
                          elif char == "9" or char == "\u2089":
                              char = "<sub>9</sub>"
                          elif char == "1" or char == "\u2081":
                              char = "<sub>1</sub>"
                          elif char == "0" or char == "\u2080":
                              char = "<sub>0</sub>"
                      formula += char
              if ascii:
                  formula = formula.replace("<sub>",
          "").replace("</sub", "")
              return mark_safe(smart_text(formula))
          
          
          
          
        
         
         
         
        
       
      
      _______________________________________________
melbourne-pug mailing list
melbourne-pug@python.org
https://mail.python.org/mailman/listinfo/melbourne-pug

    
    
  

_______________________________________________
melbourne-pug mailing list
melbourne-pug@python.org
https://mail.python.org/mailman/listinfo/melbourne-pug

_______________________________________________
melbourne-pug mailing list
melbourne-pug@python.org
https://mail.python.org/mailman/listinfo/melbourne-pug

Reply via email to