The python3 bindings create PyUnicode objects from application strings on the guest (i.e. installed rpm, deb packages). It is documented that rpm package fields such as description should be utf8 encoded - however in some cases they are not a valid unicode string, on SLES11 SP4 the encoding of the description of the following packages is latin1 and they fail to be converted to unicode using guestfs_int_py_fromstring() (which invokes PyUnicode_FromString()):
PackageKit aaa_base coreutils dejavu desktop-data-SLED gnome-utils hunspell hunspell-32bit hunspell-tools libblocxx6 libexif libgphoto2 libgtksourceview-2_0-0 libmpfr1 libopensc2 libopensc2-32bit liborc-0_4-0 libpackagekit-glib10 libpixman-1-0 libpixman-1-0-32bit libpoppler-glib4 libpoppler5 libsensors3 libtelepathy-glib0 m4 opensc opensc-32bit permissions pinentry poppler-tools python-gtksourceview splashy syslog-ng tar tightvnc xorg-x11 xorg-x11-xauth yast2-mouse Fix this by globally changing guestfs_int_py_fromstring() and guestfs_int_py_fromstringsize() to fallback to latin1 decoding if utf-8 decoding fails. Using the "strict" error handler doesn't matter in the case of latin1 and has the same effect of "replace": https://docs.python.org/3/library/codecs.html#error-handlers Signed-off-by: Sam Eiderman <[email protected]> --- python/handle.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/python/handle.c b/python/handle.c index 2fb8c18f0..fe89dc58a 100644 --- a/python/handle.c +++ b/python/handle.c @@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str) #if PY_MAJOR_VERSION < 3 return PyString_FromString (str); #else - return PyUnicode_FromString (str); + return guestfs_int_py_fromstringsize (str, strlen (str)); #endif } @@ -397,7 +397,12 @@ guestfs_int_py_fromstringsize (const char *str, size_t size) #if PY_MAJOR_VERSION < 3 return PyString_FromStringAndSize (str, size); #else - return PyUnicode_FromStringAndSize (str, size); + PyObject *s = PyUnicode_FromString (str); + if (s == NULL) { + PyErr_Clear (); + s = PyUnicode_Decode (str, strlen(str), "latin1", "strict"); + } + return s; #endif } -- 2.26.2.303.gf8c07b1a785-goog _______________________________________________ Libguestfs mailing list [email protected] https://www.redhat.com/mailman/listinfo/libguestfs
