To add to the importance of proper UTF-8, the IETF has mandated UTF-8 as the 
direction going forward, and several protocols have been upgraded for UTF-8 
support.


--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3

________________________________________
From: IBM Mainframe Discussion List [IBM-MAIN@LISTSERV.UA.EDU] on behalf of 
David Crayford [dcrayf...@gmail.com]
Sent: Saturday, October 1, 2022 10:38 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: IBM python documentation?

On 2/10/22 01:40, Phil Smith III wrote:
> Jay Maynard wrote:
>> OK, so what kind of issues are there with UTF-8? Especially since it's
>> pretty much the standard everywhere, these days?
>
>
> Yeah, that caught my eye too. I suspect the answer is that *mixing* UTF-8
> and EBCDIC gets complicated because you cannot always convert: e.g., if you
> have <Greek character><Cyrillic character> in the same string, UTF-anything
> can handle it, but you cannot convert that string to EBCDIC because those
> two characters are in different EBCDIC code pages.
>
>

All good points. There are other issues when setting _BPX_AUTOCVT=ALL
and things start to break depending on your system configuration.
If you set _BPX_AUTOCVT=ALL and use Rocket ported tools they will break.
There are other issues not related to UTF-8 to file tagging and text
conversion in general. Unnamed pipes only do auto-conversion on one side
of the pipe so any ported tools using unnamed pipes will have to convert
to named pipes. The language specification for Golang, Python, JSON,
YAML etc all mandate UTF-8 but all the ported files on z/OS are tagged
ISO8859-1.

Using the ISPF editor with tagged files can get interesting. It doesn't
work at all with UTF-8 if AUTOCVT(ALL) is set in BPXPRMxx. If
AUTOCVT(ON) is set and the user is using an emulator with a codepage
that is not IBM-1047 then unpredictable results will occur. We had a
case opened with IBM when a customer using a German code page created
malformed YAML files using the ISPF editor. It took ages to diagnose and
the PMR was closed WAD. IBM asked to to open an RFE to implement a new
configuration option to prevent this unwanted translation occurring.

>
> Combine that with UTF-8 normalization and variable-length characters and
> it's bewildering for EBCDIC-based minds.
>
>
>
> This does NOT really reflect deficiencies in UTF-8 but rather just
> difficulties switching between EBCDIC and UTF-8.
>
>
>
> ISO8859-1 is cleaner (for cases where it's sufficient!) because it CAN map
> 1:1 to EBCDIC. Of course it's not sufficient in many, many cases in a global
> economy.
>
>
>
> ...phsiii
>
>
> ----------------------------------------------------------------------
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to