On Mon, 10 Feb 2020 07:58:26 -0600, Bill Godfrey wrote:

>Given a USS file utf16.txt containing 6 UTF-16 characters, 12 bytes:
>
>>od -tx1 -An utf16.txt
>    00  28  20  1C  00  61  20  1D  00  29  00  0A
>
>U+0028 is left parenthesis
>U+201C is left double quotation mark
>U+0061 is small letter "a"
>U+201D is right double quotation mark
>U+0029 is right parenthesis
>
>There are no correstponding quotation marks in EBCDIC 1047.
>The iconv command converts them to hex 3F.
>
>>iconv -f 1200 -t ibm-1047 <utf16.txt | od -tx1 -tc -An
>    4D  3F  81  3F  5D  15
>     ( 077   a 077   )  \n
>
I submitted an RCF a couple days ago.  This should be documented.

Hex 3F is SUBstitute, intended as a substitute for untranslatable
characters.  Good.  If the target code page is ASCII-based, does it
produce the corresponding hex 1A?  It should despite the risk that
CP/M (and old MS-DOS?) misused SUB as end-of-text-file.

POSIX leaves the effect of errors implementation defined.

Tests on MacOS and Linux produce chaotic results, symptomatic
of bugs in their iconv utilities.  Linux seems to require BOM with
UTF-16.

-- gil

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to