The python3 bindings create PyUnicode objects from application strings
on the guest (i.e. installed rpm, deb packages).
It is documented that rpm package fields such as description should be
utf8 encoded - however in some cases they are not a valid unicode
string, on SLES11 SP4 the encoding of the description of the following
packages is latin1 and they fail to be converted to unicode using
guestfs_int_py_fromstring() (which invokes PyUnicode_FromString()):

 PackageKit
 aaa_base
 coreutils
 dejavu
 desktop-data-SLED
 gnome-utils
 hunspell
 hunspell-32bit
 hunspell-tools
 libblocxx6
 libexif
 libgphoto2
 libgtksourceview-2_0-0
 libmpfr1
 libopensc2
 libopensc2-32bit
 liborc-0_4-0
 libpackagekit-glib10
 libpixman-1-0
 libpixman-1-0-32bit
 libpoppler-glib4
 libpoppler5
 libsensors3
 libtelepathy-glib0
 m4
 opensc
 opensc-32bit
 permissions
 pinentry
 poppler-tools
 python-gtksourceview
 splashy
 syslog-ng
 tar
 tightvnc
 xorg-x11
 xorg-x11-xauth
 yast2-mouse

Fix this by globally changing guestfs_int_py_fromstring()
and guestfs_int_py_fromstringsize() to fallback to latin1 decoding if
utf-8 decoding fails.

Using the "strict" error handler doesn't matter in the case of latin1
and has the same effect of "replace":

 https://docs.python.org/3/library/codecs.html#error-handlers

Signed-off-by: Sam Eiderman <sam...@google.com>
---
 python/handle.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/python/handle.c b/python/handle.c
index 2fb8c18f0..fe89dc58a 100644
--- a/python/handle.c
+++ b/python/handle.c
@@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str)
 #if PY_MAJOR_VERSION < 3
   return PyString_FromString (str);
 #else
-  return PyUnicode_FromString (str);
+  return guestfs_int_py_fromstringsize (str, strlen (str));
 #endif
 }
 
@@ -397,7 +397,12 @@ guestfs_int_py_fromstringsize (const char *str, size_t 
size)
 #if PY_MAJOR_VERSION < 3
   return PyString_FromStringAndSize (str, size);
 #else
-  return PyUnicode_FromStringAndSize (str, size);
+  PyObject *s = PyUnicode_FromString (str);
+  if (s == NULL) {
+    PyErr_Clear ();
+    s = PyUnicode_Decode (str, strlen(str), "latin1", "strict");
+  }
+  return s;
 #endif
 }
 
-- 
2.26.2.303.gf8c07b1a785-goog


_______________________________________________
Libguestfs mailing list
Libguestfs@redhat.com
https://www.redhat.com/mailman/listinfo/libguestfs

Reply via email to