On Wednesday, March 23, 2016 9:36:45 PM WET Georg Baum wrote:
> You are right. I did only test the patch manually with some of the 
> conversions, and they did work. Now I did test it more systematically in
> the  build system, and it turned out that in some cases the u prefix is
> needed, but not in all. Why? Or isn this code simply not called? There were
> also some encode/decode calls that do now need to be removed (here I
> understand why).

The reason is explained here:

$ ipython --no-banner

In [1]: type( "123%s" % "")
Out[1]: str

In [2]: type( "123%s" % u"")
Out[2]: unicode

The issue is that, in python 2, if you interpolate (the % operator) a string 
with a string you get a string.

If you interpolate a string using an unicode string you get an unicode string.

In all those cases where you pass an string read from file, where you declared 
the encoding to be utf-8 you are already using an unicode string and so the 
interpolation results in a unicode string and all works.

For python 3 the strings are now unicode strings and so all works:

$ ipython3 --no-banner

In [1]: type( "123%s" % "")
Out[1]: str

In [2]: type( "123%s" % u"")
Out[2]: str

So a safe bet would be to prefix all the strings with an u, overkill sure but 
it will surely work. :-D

Incidentally this is the reason why I insisted that if we support python 3 
with should go at least with 3.3. In python 3.3 the u string prefix was 
reintroduced in the language, where it is a no-op, allowing to use the same 
for python 2 and python 3.

> Attached is the updated patch, but since I do not completely understand it
> I  think we should postpone it.
> 
> Georg

It is just a matter of testing it and where it fails to add the u prefix to 
the string. :-)

Thank you for taking care of this.
-- 
José Abílio

Reply via email to