I agree that the design of the MACROCASE option was confusing, but it was too late to change it by the time I got involved!
ASCII and Unicode constants only support characters which can be entered in the current EBCDIC SBCS input code page. If the input code page is CECP, that is essentially the same character set as the first 256 code points of Unicode (and ISO 8859-1), including the usual western European accented characters. If it is Euro, the Euro symbol is supported. If it is Latin-9 code page 924, some other European accented letters are supported. The Unicode representation for generated constants is selected using the UNICODE option to specify the corresponding code page number and the CODEPAGE(LOCAL) option to specify that conversion from EBCDIC should use standard internal tables. The following are supported: UTF-16BE: 1200 UTF-16LE: 1202 UTF-8: 1208 Examples: CU'é' (e with acute accent) with UNICODE code page 1200 gives x'00E9', with 1202 gives x'E900' and with 1208 gives x'C3A9'. CU'€' (Euro) with a Euro EBCDIC code page and UNICODE code page 1200 gives x'20AC', with 1202 gives x'AC20' and with 1208 gives x'E282AC'. There are currently no supported characters which overflow UTF-16, so there is no issue with surrogate codes. The implementation of UTF-8 was particularly tricky because the current EBCDIC and UNICODE options in effect may affect the number of output bytes for a given input byte. This means that if a DC for a CU-type constant gets deferred, the assembler must keep track of the EBCDIC and UNICODE options which were in effect for that statement and use them for any subsequent retry. (As I'm now retired, I no longer have access to IBM internal information, so some of the above is from memory, but I hope I remembered it correctly.) Jonathan Scott -----Original Message----- From: IBM Mainframe Assembler List <ASSEMBLER-LIST@LISTSERV.UGA.EDU> On Behalf Of Paul Gilmartin Sent: 28 August 2025 17:58 To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Is HLASM efficient WAS: Telum and SpyreWAS: Vector instruction performance On 8/28/25 02:20, Jonathan Scott wrote: > HLASM itself contains a lot of mixed-case assembler source, and the HLASM > operating system interfaces for MVS, CMS and Linux are mostly written in > mixed-case PL/X. There are indeed some limitations on macro keyword values, > but an increasing proportion of macros have been coded or modified to support > lower case values. > ... I consider MACROCASE to be a design blunder. For all other options, COMPAT(option) provides behavior compatible with Assembler H; COMPAT(NO option) provides incompatible behavior. However, for the *SAME* source code, COMPAT(MACROCASE) provides incompatible behavior; for compatible behavior, COMPAT(NOMACROCASE) is necessary. > Data type CU is Unicode, which has nothing to do with upper case. A macro > can convert a string to upper case using the UPPER built-in function. > ... How does that work? can I code something as simple as: DC CU'π' and get the value of x'cf80' for CCSID 1209? -- Thanks, gil