[ovirt-devel] Re: unicode_literrals vs "u''" vs six.text_type

Amit Bawer Mon, 02 Sep 2019 00:27:23 -0700

On Sun, Sep 1, 2019 at 8:23 PM Yedidyah Bar David <[email protected]> wrote:


> On Sun, Sep 1, 2019 at 3:37 PM Amit Bawer <[email protected]> wrote:
> >
> >
> >
> > On Sun, Sep 1, 2019 at 2:34 PM Yedidyah Bar David <[email protected]>
> wrote:
> >>
> >> On Sun, Sep 1, 2019 at 1:20 PM Amit Bawer <[email protected]> wrote:
> >> >
> >> >
> >> >
> >> > On Sun, Sep 1, 2019 at 10:28 AM Yedidyah Bar David <[email protected]>
> wrote:
> >> >>
> >> >> Hi all,
> >> >>
> >> >> That's a "sub-thread" of "unicode sandwich in otopi/engine-setup".
> >> >>
> >> >> I was recommended to use 'six.text_type() over "u''". I did read [1],
> >> >> and eventually decided that my own preference is to just add "u"
> >> >> prefix. Reasoning is inside [1].
> >> >>
> >> >> Do people have different preferences/reasoning they want to share?
> >> >>
> >> >> Do people think we should have project-wide policy re this?
> >> >
> >> >
> >> > Since our code is currently transitioning from py2 to py2/py3, and
> not from py3 to py3/py2, it would be fair to assume that most
> >> > already existing string literals in it contain ascii symbols, unless
> explicitly stated otherwise;
> >> > so IMO it would only make sense to enforce 'u' over newly added
> literals which involve non-ascii symbols as long as py2 is still alive.
> >>
> >> Not exactly.
> >>
> >> Suppose (mostly correctly) that the code didn't employ the "unicode
> >> sandwich" technique so far. Meaning, much was handled as python2 str
> >> objects containing utf-8-encoded strings, and converted to unicode
> >> objects mainly as needed/noted/considered. Suppose that x is a
> >> variable that used to contain such an str, usually ascii-only, but
> >> sometimes perhaps utf-8. Now, this:
> >>
> >> 'x: {}'.format(x)
> >>
> >> would work, and replace {} with the contents of x, and return a
> >> python2 str, utf-8-encoded if x is utf-8. But if now x contains a
> >> unicode object (because we decided to follow the sandwich approach,
> >> and encode all utf-8 during input), it would fail, if x is not
> >> ascii-only. Adding u to 'x: {}' solves this.
> >
> >
> > utf-8 is an ascii extension, meaning that first 128 ordinals agree for
> both encodings, so unicode sandwich has no negative effect on your example.
> > It would be only a problem only if input for x originally had a
> non-ascii character in it, but that should have been an issue for py2 in
> the first place, regardless to py3 sandwiches.
>
> Let me clarify:
>

Thanks, now i see where i was wrong.


> In python2:
>
> If I start with:
>
> x='א'
>

py2: x is 2 bytes: '\xd7\x90'
py3: x is unicode str with a single symbol '\u05d0'


> '{}'.format(x)
>
> Works.


py2: two bytes, each is < 128, so its fine.
py3: default unicode string, so its fine.


>
> If I then employ the sandwich, and therefore effectively change the code
> to be:
>
> x=u'א'
>

now py2 and py3 agree on contents of x, so sandwiching seems like the right
choice to make sure they treat x the same way.


> '{}'.format(x)
>
> Fails.
>
> To fix, I can change it to:
>
> u'{}'.format(x)
>

seems like a legit option to bridge the default encoding gap between py2
and py3


> Or, to import unicode_literals and keep the existing code line(s).
>
> Both work.
>
> In actual code, the assignment to x will/might be in a different
> module, and/or not contain a literal but user input, but '{}' _will_
> be a literal.
>
> Do people have preferences? Can people share their reasoning for their
> preferences? Do you think we should have policies, or it's up to each
> git repo, or even each patch author+maintainers/reviewers to decide?
>
> As discussed in the original [1], both have pros and cons. Personally
> I prefer "u''". But not strongly, because we try to keep our modules
> rather small, so it's not like you add a single import line that
> changes the semantics of hundreds or thousands of lines. Usually, it's
> rather easy to decide that such an import is ok. Ideally, we'd have
> full code coverage in our tests, including utf-8 everywhere, but I
> think we are quite far from that, for now.
>
> Thanks and best regards,
>
> >
> >>
> >> So I have to handle also all existing such literals, at least those
> >> that would now require handling unicode vars.
> >>
> >> >
> >> >>
> >> >>
> >> >> Personally, I do not see the big advantage of adding
> "six.text_type()"
> >> >> (15 chars) instead of a single "u". I do see where it can be useful,
> >> >> but not as a very long replacement, IMO, for "u", or for
> >> >> unicode_literals.
> >> >
> >> >
> >> > Once py2 will be officially terminated, probably neither option
> mentioned above would be meaningful as unicode is py3's default string
> encoding;
> >> > however IMO for literals it seems that an explicit 'u' is a more
> native approach, and provides clarity about the intentions of the
> programmer compared
> >> > to a global switch button in the form of import unicode_literals.
> Using six.text_type() is probably a good solution nowadays for variables
> and not literals,
> >> > and would probably have to die off some day after py2 does the same.
> >> >
> >> >>
> >> >>
> >> >> Thanks and best regards,
> >> >>
> >> >> [1] http://python-future.org/unicode_literals.html
> >> >> --
> >> >> Didi
> >> >> _______________________________________________
> >> >> Devel mailing list -- [email protected]
> >> >> To unsubscribe send an email to [email protected]
> >> >> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> >> >> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> >> >> List Archives:
> https://lists.ovirt.org/archives/list/[email protected]/message/SW3P4VOGBP43N54CQEH3YURN6X5ZMWIX/
> >>
> >>
> >>
> >> --
> >> Didi
>
>
>
> --
> Didi
>

_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/AZJ2B24ZDTTX7Z6RFAXIWCREPT5VXKMX/

[ovirt-devel] Re: unicode_literrals vs "u''" vs six.text_type

Reply via email to