On Sun, Sep 1, 2019 at 8:23 PM Yedidyah Bar David <[email protected]> wrote:
> On Sun, Sep 1, 2019 at 3:37 PM Amit Bawer <[email protected]> wrote: > > > > > > > > On Sun, Sep 1, 2019 at 2:34 PM Yedidyah Bar David <[email protected]> > wrote: > >> > >> On Sun, Sep 1, 2019 at 1:20 PM Amit Bawer <[email protected]> wrote: > >> > > >> > > >> > > >> > On Sun, Sep 1, 2019 at 10:28 AM Yedidyah Bar David <[email protected]> > wrote: > >> >> > >> >> Hi all, > >> >> > >> >> That's a "sub-thread" of "unicode sandwich in otopi/engine-setup". > >> >> > >> >> I was recommended to use 'six.text_type() over "u''". I did read [1], > >> >> and eventually decided that my own preference is to just add "u" > >> >> prefix. Reasoning is inside [1]. > >> >> > >> >> Do people have different preferences/reasoning they want to share? > >> >> > >> >> Do people think we should have project-wide policy re this? > >> > > >> > > >> > Since our code is currently transitioning from py2 to py2/py3, and > not from py3 to py3/py2, it would be fair to assume that most > >> > already existing string literals in it contain ascii symbols, unless > explicitly stated otherwise; > >> > so IMO it would only make sense to enforce 'u' over newly added > literals which involve non-ascii symbols as long as py2 is still alive. > >> > >> Not exactly. > >> > >> Suppose (mostly correctly) that the code didn't employ the "unicode > >> sandwich" technique so far. Meaning, much was handled as python2 str > >> objects containing utf-8-encoded strings, and converted to unicode > >> objects mainly as needed/noted/considered. Suppose that x is a > >> variable that used to contain such an str, usually ascii-only, but > >> sometimes perhaps utf-8. Now, this: > >> > >> 'x: {}'.format(x) > >> > >> would work, and replace {} with the contents of x, and return a > >> python2 str, utf-8-encoded if x is utf-8. But if now x contains a > >> unicode object (because we decided to follow the sandwich approach, > >> and encode all utf-8 during input), it would fail, if x is not > >> ascii-only. Adding u to 'x: {}' solves this. > > > > > > utf-8 is an ascii extension, meaning that first 128 ordinals agree for > both encodings, so unicode sandwich has no negative effect on your example. > > It would be only a problem only if input for x originally had a > non-ascii character in it, but that should have been an issue for py2 in > the first place, regardless to py3 sandwiches. > > Let me clarify: > Thanks, now i see where i was wrong. > In python2: > > If I start with: > > x='א' > py2: x is 2 bytes: '\xd7\x90' py3: x is unicode str with a single symbol '\u05d0' > '{}'.format(x) > > Works. py2: two bytes, each is < 128, so its fine. py3: default unicode string, so its fine. > > If I then employ the sandwich, and therefore effectively change the code > to be: > > x=u'א' > now py2 and py3 agree on contents of x, so sandwiching seems like the right choice to make sure they treat x the same way. > '{}'.format(x) > > Fails. > > To fix, I can change it to: > > u'{}'.format(x) > seems like a legit option to bridge the default encoding gap between py2 and py3 > Or, to import unicode_literals and keep the existing code line(s). > > Both work. > > In actual code, the assignment to x will/might be in a different > module, and/or not contain a literal but user input, but '{}' _will_ > be a literal. > > Do people have preferences? Can people share their reasoning for their > preferences? Do you think we should have policies, or it's up to each > git repo, or even each patch author+maintainers/reviewers to decide? > > As discussed in the original [1], both have pros and cons. Personally > I prefer "u''". But not strongly, because we try to keep our modules > rather small, so it's not like you add a single import line that > changes the semantics of hundreds or thousands of lines. Usually, it's > rather easy to decide that such an import is ok. Ideally, we'd have > full code coverage in our tests, including utf-8 everywhere, but I > think we are quite far from that, for now. > > Thanks and best regards, > > > > >> > >> So I have to handle also all existing such literals, at least those > >> that would now require handling unicode vars. > >> > >> > > >> >> > >> >> > >> >> Personally, I do not see the big advantage of adding > "six.text_type()" > >> >> (15 chars) instead of a single "u". I do see where it can be useful, > >> >> but not as a very long replacement, IMO, for "u", or for > >> >> unicode_literals. > >> > > >> > > >> > Once py2 will be officially terminated, probably neither option > mentioned above would be meaningful as unicode is py3's default string > encoding; > >> > however IMO for literals it seems that an explicit 'u' is a more > native approach, and provides clarity about the intentions of the > programmer compared > >> > to a global switch button in the form of import unicode_literals. > Using six.text_type() is probably a good solution nowadays for variables > and not literals, > >> > and would probably have to die off some day after py2 does the same. > >> > > >> >> > >> >> > >> >> Thanks and best regards, > >> >> > >> >> [1] http://python-future.org/unicode_literals.html > >> >> -- > >> >> Didi > >> >> _______________________________________________ > >> >> Devel mailing list -- [email protected] > >> >> To unsubscribe send an email to [email protected] > >> >> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > >> >> oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > >> >> List Archives: > https://lists.ovirt.org/archives/list/[email protected]/message/SW3P4VOGBP43N54CQEH3YURN6X5ZMWIX/ > >> > >> > >> > >> -- > >> Didi > > > > -- > Didi >
_______________________________________________ Devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/AZJ2B24ZDTTX7Z6RFAXIWCREPT5VXKMX/
