[ovirt-devel] Re: unicode_literrals vs "u''" vs six.text_type

Amit Bawer Sun, 01 Sep 2019 05:38:39 -0700

On Sun, Sep 1, 2019 at 2:34 PM Yedidyah Bar David <[email protected]> wrote:


> On Sun, Sep 1, 2019 at 1:20 PM Amit Bawer <[email protected]> wrote:
> >
> >
> >
> > On Sun, Sep 1, 2019 at 10:28 AM Yedidyah Bar David <[email protected]>
> wrote:
> >>
> >> Hi all,
> >>
> >> That's a "sub-thread" of "unicode sandwich in otopi/engine-setup".
> >>
> >> I was recommended to use 'six.text_type() over "u''". I did read [1],
> >> and eventually decided that my own preference is to just add "u"
> >> prefix. Reasoning is inside [1].
> >>
> >> Do people have different preferences/reasoning they want to share?
> >>
> >> Do people think we should have project-wide policy re this?
> >
> >
> > Since our code is currently transitioning from py2 to py2/py3, and not
> from py3 to py3/py2, it would be fair to assume that most
> > already existing string literals in it contain ascii symbols, unless
> explicitly stated otherwise;
> > so IMO it would only make sense to enforce 'u' over newly added literals
> which involve non-ascii symbols as long as py2 is still alive.
>
> Not exactly.
>
> Suppose (mostly correctly) that the code didn't employ the "unicode
> sandwich" technique so far. Meaning, much was handled as python2 str
> objects containing utf-8-encoded strings, and converted to unicode
> objects mainly as needed/noted/considered. Suppose that x is a
> variable that used to contain such an str, usually ascii-only, but
> sometimes perhaps utf-8. Now, this:
>
> 'x: {}'.format(x)
>
> would work, and replace {} with the contents of x, and return a
> python2 str, utf-8-encoded if x is utf-8. But if now x contains a
> unicode object (because we decided to follow the sandwich approach,
> and encode all utf-8 during input), it would fail, if x is not
> ascii-only. Adding u to 'x: {}' solves this.
>

utf-8 is an ascii extension, meaning that first 128 ordinals agree for both
encodings, so unicode sandwich has no negative effect on your example.
It would be only a problem only if input for x originally had a non-ascii
character in it, but that should have been an issue for py2 in the first
place, regardless to py3 sandwiches.


> So I have to handle also all existing such literals, at least those
> that would now require handling unicode vars.
>
> >
> >>
> >>
> >> Personally, I do not see the big advantage of adding "six.text_type()"
> >> (15 chars) instead of a single "u". I do see where it can be useful,
> >> but not as a very long replacement, IMO, for "u", or for
> >> unicode_literals.
> >
> >
> > Once py2 will be officially terminated, probably neither option
> mentioned above would be meaningful as unicode is py3's default string
> encoding;
> > however IMO for literals it seems that an explicit 'u' is a more native
> approach, and provides clarity about the intentions of the programmer
> compared
> > to a global switch button in the form of import unicode_literals. Using
> six.text_type() is probably a good solution nowadays for variables and not
> literals,
> > and would probably have to die off some day after py2 does the same.
> >
> >>
> >>
> >> Thanks and best regards,
> >>
> >> [1] http://python-future.org/unicode_literals.html
> >> --
> >> Didi
> >> _______________________________________________
> >> Devel mailing list -- [email protected]
> >> To unsubscribe send an email to [email protected]
> >> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> >> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> >> List Archives:
> https://lists.ovirt.org/archives/list/[email protected]/message/SW3P4VOGBP43N54CQEH3YURN6X5ZMWIX/
>
>
>
> --
> Didi
>

_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/T5S3SCV23QNL67WKHVVLPXXL4AYNTW3M/

[ovirt-devel] Re: unicode_literrals vs "u''" vs six.text_type

Reply via email to