On Wednesday, March 23, 2016 9:36:45 PM WET Georg Baum wrote: > You are right. I did only test the patch manually with some of the > conversions, and they did work. Now I did test it more systematically in > the build system, and it turned out that in some cases the u prefix is > needed, but not in all. Why? Or isn this code simply not called? There were > also some encode/decode calls that do now need to be removed (here I > understand why).
The reason is explained here: $ ipython --no-banner In [1]: type( "123%s" % "") Out[1]: str In [2]: type( "123%s" % u"") Out[2]: unicode The issue is that, in python 2, if you interpolate (the % operator) a string with a string you get a string. If you interpolate a string using an unicode string you get an unicode string. In all those cases where you pass an string read from file, where you declared the encoding to be utf-8 you are already using an unicode string and so the interpolation results in a unicode string and all works. For python 3 the strings are now unicode strings and so all works: $ ipython3 --no-banner In [1]: type( "123%s" % "") Out[1]: str In [2]: type( "123%s" % u"") Out[2]: str So a safe bet would be to prefix all the strings with an u, overkill sure but it will surely work. :-D Incidentally this is the reason why I insisted that if we support python 3 with should go at least with 3.3. In python 3.3 the u string prefix was reintroduced in the language, where it is a no-op, allowing to use the same for python 2 and python 3. > Attached is the updated patch, but since I do not completely understand it > I think we should postpone it. > > Georg It is just a matter of testing it and where it fails to add the u prefix to the string. :-) Thank you for taking care of this. -- José Abílio