I'm with you on the "outsourcing" to Unicode services. It's what I do in "my" product. As you say, leave it to the experts.
I'm not with you on the one characters to two. It's assembler, man, it's not an HLL (despite the name!). It's supposed to be 1:1 with what the developer codes, not "solve my problem in the best way you can." > could a constant CA'€' correctly turn into X'20AC' What if I code MVI TARGET,CA'€' -- how does that assemble if CA'€' is two bytes? Charles -----Original Message----- From: IBM Mainframe Assembler List [mailto:[email protected]] On Behalf Of Tony Harminc Sent: Wednesday, October 18, 2017 6:02 PM To: [email protected] Subject: Re: ASCII self-defining constants On 18 October 2017 at 13:26, Jonathan Scott <[email protected]> wrote: [so glad to hear that you're looking into this stuff!] > I think that my preferred solution would be to specify the EBCDIC code > page for all purposes using the CODEPAGE option (or perhaps a new > CCSID option) and to assume that ASCII generally means ISO > 8859-1 code page 819, with a full 256-byte translate table. (A CCSID > option might even make it possible to convert mixed DBCS constants to > UTF-8 in the distant future). However, the main complication is how > to get to that position without requiring lots of new options, making > it as easy as possible to get it right, while avoiding breaking > compatibility. I think allowing the user to specify both the inbound and (one or more) outbound code pages, and passing those to Unicode Conversion Services would be the the Right Thing To Do. I'm not sure to what extent such conversion is supported on non z/OS platforms (VM, VSE, zLinux?), but the notion of externalizing it to the experts, and effectively putting it under user control because the user can supply their own tables if they want, is good. Is there currently a fundamental constraint that limits the output to the same character size as the input? For example, if I specify the input is CP 1047, and the output is CP 1208 (= UTF-8), could a constant CA'€' correctly turn into X'20AC'? (I'm assuming for the example that it's the CA constant type that's subject to this translation.) In other words, internally does the code assume that a 1-byte string must generate a 1-byte value?
