On Mon, 10 Feb 2020 07:58:26 -0600, Bill Godfrey wrote: >Given a USS file utf16.txt containing 6 UTF-16 characters, 12 bytes: > >>od -tx1 -An utf16.txt > 00 28 20 1C 00 61 20 1D 00 29 00 0A > >U+0028 is left parenthesis >U+201C is left double quotation mark >U+0061 is small letter "a" >U+201D is right double quotation mark >U+0029 is right parenthesis > >There are no correstponding quotation marks in EBCDIC 1047. >The iconv command converts them to hex 3F. > >>iconv -f 1200 -t ibm-1047 <utf16.txt | od -tx1 -tc -An > 4D 3F 81 3F 5D 15 > ( 077 a 077 ) \n > I submitted an RCF a couple days ago. This should be documented.
Hex 3F is SUBstitute, intended as a substitute for untranslatable characters. Good. If the target code page is ASCII-based, does it produce the corresponding hex 1A? It should despite the risk that CP/M (and old MS-DOS?) misused SUB as end-of-text-file. POSIX leaves the effect of errors implementation defined. Tests on MacOS and Linux produce chaotic results, symptomatic of bugs in their iconv utilities. Linux seems to require BOM with UTF-16. -- gil ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN