On 18 October 2017 at 13:26, Jonathan Scott <[email protected]> wrote:

[so glad to hear that you're looking into this stuff!]

> I think that my preferred solution would be to specify the EBCDIC
> code page for all purposes using the CODEPAGE option (or perhaps
> a new CCSID option) and to assume that ASCII generally means ISO
> 8859-1 code page 819, with a full 256-byte translate table.  (A
> CCSID option might even make it possible to convert mixed DBCS
> constants to UTF-8 in the distant future).  However, the main
> complication is how to get to that position without requiring
> lots of new options, making it as easy as possible to get it
> right, while avoiding breaking compatibility.

I think allowing the user to specify both the inbound and (one or
more) outbound code pages, and passing those to Unicode Conversion
Services would be the the Right Thing To Do. I'm not sure to what
extent such conversion is supported on non z/OS platforms (VM, VSE,
zLinux?), but the notion of externalizing it to the experts, and
effectively putting it under user control because the user can supply
their own tables if they want, is good. Is there currently a
fundamental constraint that limits the output to the same character
size as the input? For example, if I specify the input is CP 1047, and
the output is CP 1208 (= UTF-8), could a constant CA'€' correctly turn
into X'20AC'? (I'm assuming for the example that it's the CA constant
type that's subject to this translation.) In other words, internally
does the code assume that a 1-byte string must generate a 1-byte
value?

Another Right Thing would be to allow these options (or at least the
"output" one(s)) to be changed on the fly with a *PROCESS (or
whatever) statement.

I realize you don't want to be adding a zillion options, but perhaps
one to flag (or not) conversion failures or probable "surprises" would
be wise.

Tony H.

Reply via email to