On Sun, Apr 26, 2020 at 09:14:03PM +0300, Sam Eiderman wrote:
> The python3 bindings create PyUnicode objects from application strings
> on the guest (i.e. installed rpm, deb packages).
> It is documented that rpm package fields such as description should be
> utf8 encoded - however in some cases they are not a valid unicode
> string, on SLES11 SP4 the encoding of the description of the following
> packages is latin1 and they fail to be converted to unicode using
> guestfs_int_py_fromstring() (which invokes PyUnicode_FromString()):
> 
>  PackageKit
>  aaa_base
>  coreutils
>  dejavu
>  desktop-data-SLED
>  gnome-utils
>  hunspell
>  hunspell-32bit
>  hunspell-tools
>  libblocxx6
>  libexif
>  libgphoto2
>  libgtksourceview-2_0-0
>  libmpfr1
>  libopensc2
>  libopensc2-32bit
>  liborc-0_4-0
>  libpackagekit-glib10
>  libpixman-1-0
>  libpixman-1-0-32bit
>  libpoppler-glib4
>  libpoppler5
>  libsensors3
>  libtelepathy-glib0
>  m4
>  opensc
>  opensc-32bit
>  permissions
>  pinentry
>  poppler-tools
>  python-gtksourceview
>  splashy
>  syslog-ng
>  tar
>  tightvnc
>  xorg-x11
>  xorg-x11-xauth
>  yast2-mouse
> 
> Fix this by globally changing guestfs_int_py_fromstring()
> and guestfs_int_py_fromstringsize() to fallback to latin1 decoding if
> utf-8 decoding fails.
> 
> Using the "strict" error handler doesn't matter in the case of latin1
> and has the same effect of "replace":
> 
>  https://docs.python.org/3/library/codecs.html#error-handlers
> 
> Signed-off-by: Sam Eiderman <sam...@google.com>
> ---
>  python/handle.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/python/handle.c b/python/handle.c
> index 2fb8c18f0..fe89dc58a 100644
> --- a/python/handle.c
> +++ b/python/handle.c
> @@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str)
>  #if PY_MAJOR_VERSION < 3
>    return PyString_FromString (str);
>  #else
> -  return PyUnicode_FromString (str);
> +  return guestfs_int_py_fromstringsize (str, strlen (str));
>  #endif
>  }
>  
> @@ -397,7 +397,12 @@ guestfs_int_py_fromstringsize (const char *str, size_t 
> size)
>  #if PY_MAJOR_VERSION < 3
>    return PyString_FromStringAndSize (str, size);
>  #else
> -  return PyUnicode_FromStringAndSize (str, size);
> +  PyObject *s = PyUnicode_FromString (str);
> +  if (s == NULL) {
> +    PyErr_Clear ();
> +    s = PyUnicode_Decode (str, strlen(str), "latin1", "strict");
> +  }
> +  return s;
>  #endif
>  }

Looks OK to me.  Pino - any objections to merging this?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

_______________________________________________
Libguestfs mailing list
Libguestfs@redhat.com
https://www.redhat.com/mailman/listinfo/libguestfs

Reply via email to