Re: [discuss] pkg, python3 and unicode

Joshua M. Clulow Wed, 11 Mar 2020 22:51:14 -0700

On Wed, 11 Mar 2020 at 01:31, Till Wegmüller <[email protected]> wrote:
> AFAIK only C.UTF-8 is sane for languages such as Python.
> Everything else will cause a failure somewhere in the code, as it is
> simply too many calls / type conversions. We should ensure that zlogin
> (or in general zone enter code) enforces C.UTF-8 as locale and not C.


I don't think that's right.  The "zlogin" program is really no
different to "ssh" or any other interhost login mechanism; it should
generally inherit and respect whatever locale the user was using when
invoked.

Software in any language, including Python, needs to be written with
the data format in mind.  If IPS is processing data which is defined
to be UTF-8, then it really needs to use UTF-8 aware data types and
library routines for accessing that data rather than depend on the
locale being correct or not.  If that UTF-8 data is subsequently
rendered for the user, it should then be converted to the active
locale for display.

This same property comes up a lot in Rust, where strings provided by
the OS (e.g., argv and the environment) are treated as byte arrays
until explicitly converted to a native UTF-8 string type for string
handling.  This conversion can fail if the input is not actually
UTF-8; or, it can optionally be lossy and replace invalid UTF-8
sequences with placeholders.  Either way, you are forced to choose a
policy and handle the different cases.


Cheers.

-- 
Joshua M. Clulow
http://blog.sysmgr.org

------------------------------------------
illumos: illumos-discuss
Permalink: 
https://illumos.topicbox.com/groups/discuss/T6f78aa7809ef6ec3-M6e7b23e2cc7b975b3c77e521
Delivery options: https://illumos.topicbox.com/groups/discuss/subscription

Re: [discuss] pkg, python3 and unicode

Reply via email to