I'm with you on the "outsourcing" to Unicode services. It's what I do in "my" 
product. As you say, leave it to the experts.

I'm not with you on the one characters to two. It's assembler, man, it's not an 
HLL (despite the name!). It's supposed to be 1:1 with what the developer codes, 
not "solve my problem in the best way you can."

> could a constant CA'€' correctly turn into X'20AC'

What if I code MVI TARGET,CA'€' -- how does that assemble if CA'€' is two bytes?

Charles


-----Original Message-----
From: IBM Mainframe Assembler List [mailto:[email protected]] On 
Behalf Of Tony Harminc
Sent: Wednesday, October 18, 2017 6:02 PM
To: [email protected]
Subject: Re: ASCII self-defining constants

On 18 October 2017 at 13:26, Jonathan Scott <[email protected]> wrote:

[so glad to hear that you're looking into this stuff!]

> I think that my preferred solution would be to specify the EBCDIC code 
> page for all purposes using the CODEPAGE option (or perhaps a new 
> CCSID option) and to assume that ASCII generally means ISO
> 8859-1 code page 819, with a full 256-byte translate table.  (A CCSID 
> option might even make it possible to convert mixed DBCS constants to 
> UTF-8 in the distant future).  However, the main complication is how 
> to get to that position without requiring lots of new options, making 
> it as easy as possible to get it right, while avoiding breaking 
> compatibility.

I think allowing the user to specify both the inbound and (one or
more) outbound code pages, and passing those to Unicode Conversion Services 
would be the the Right Thing To Do. I'm not sure to what extent such conversion 
is supported on non z/OS platforms (VM, VSE, zLinux?), but the notion of 
externalizing it to the experts, and effectively putting it under user control 
because the user can supply their own tables if they want, is good. Is there 
currently a fundamental constraint that limits the output to the same character 
size as the input? For example, if I specify the input is CP 1047, and the 
output is CP 1208 (= UTF-8), could a constant CA'€' correctly turn into 
X'20AC'? (I'm assuming for the example that it's the CA constant type that's 
subject to this translation.) In other words, internally does the code assume 
that a 1-byte string must generate a 1-byte value?

Reply via email to