Re: [Lldb-commits] [PATCH] D16736: Always write the session log file in UTF-8

Todd Fiala via lldb-commits Sun, 31 Jan 2016 12:05:24 -0800

tfiala added a comment.

> I'm going to have a look at trying this modification now.



I'm getting the same error with the replace.

Here is the patch (okay the whole encoded_file.py) I was able to use to get 
past this - which ultimately looks to be an error in the match result printing 
of a raw byte buffer (never meant to be unicode printable) in the result stream 
(i.e. the error I was getting looked to be purely a by-product of printing 
match results that succeeded, not failed, but the unicode decoding introduced 
the actual failure point):

  """
                       The LLVM Compiler Infrastructure
  
  This file is distributed under the University of Illinois Open Source
  License. See LICENSE.TXT for details.
  
  Prepares language bindings for LLDB build process.  Run with --help
  to see a description of the supported command line arguments.
  """
  
  # Python modules:
  import io
  
  # Third party modules
  import six
  
  def _encoded_read(old_read, encoding):
      def impl(size):
          result = old_read(size)
          # If this is Python 2 then we need to convert the resulting `unicode` 
back
          # into a `str` before returning
          if six.PY2:
              result = result.encode(encoding)
          return result
      return impl
  
  def _encoded_write(old_write, encoding):
      def impl(s):
          # If we were asked to write a `str` (in Py2) or a `bytes` (in Py3) 
decode it
          # as unicode before attempting to write.
          if isinstance(s, six.binary_type):
              try:
                  s = s.decode(encoding)
              except UnicodeDecodeError as decode_err:
                  import sys
                  sys.stderr.write("error: unicode decode failed on raw string 
'{}': '{}'".format(s, decode_err))
                  s = u"Could not decode unicode string, see stderr for details"
          return old_write(s)
      return impl
  
  '''
  Create a Text I/O file object that can be written to with either unicode 
strings or byte strings
  under Python 2 and Python 3, and automatically encodes and decodes as 
necessary to return the
  native string type for the current Python version
  '''
  def open(file, encoding, mode='r', buffering=-1, errors=None, newline=None, 
closefd=True):
      wrapped_file = io.open(file, mode=mode, buffering=buffering, 
encoding=encoding,
                             errors=errors, newline=newline, closefd=closefd)
      new_read = _encoded_read(getattr(wrapped_file, 'read'), encoding)
      new_write = _encoded_write(getattr(wrapped_file, 'write'), encoding)
      setattr(wrapped_file, 'read', new_read)
      setattr(wrapped_file, 'write', new_write)
      return wrapped_file

It just adds a try/except block around the Unicode decode.  Is is highly likely 
that might not run on Python 3 - i.e. ping pong this back into a Python 3 
error.  I may try to bring this up on Windows to see if that does actually 
happen.

In any event, the right fix here probably is to have displays of 
matched/expected text for known-to-be binary data *not* try to print results in 
the expect-string-match code since these are just going to have no way of being 
valid.  The other way to go (perhaps better) would be to put some kind of safe 
wrapper around the byte compares, so that they are tested as ASCII-ified 
output, or use an entirely different mechanism here.

If this works on Windows the way I fixed this up, you can go ahead and check 
this in.  (I no longer get failures with the code change I made above).  If it 
doesn't work but you can tweak that slightly, feel free to go ahead and do that 
as well.  In the meantime I am going to see if I can get the binary aspect of 
the matching handled properly (i.e. not done as string compares).  This might 
be one of my tests.  (It's at least in the goop of lldb-server tests that I had 
written 1.5 to 2 years ago).


http://reviews.llvm.org/D16736



_______________________________________________
lldb-commits mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Re: [Lldb-commits] [PATCH] D16736: Always write the session log file in UTF-8

Reply via email to