James K. Lowden wrote:
> In GNU iconv, the value 0xFF does not convert to the same value in
> Unicode.  
> 
>       For UTF-8 to CP1140, 0xFF becomes 0xDF
>       For CP1140 to UTF-8, 0xFF becomes 0x9F

In all charset converters that I've consulted (glibc 2.23, GNU libiconv,
ICU 2.2, JDK 1.5, Windows 2000, Windows 2016, AIX 4.3.2, z/OS), the
character set IBM-1140 or CP1140
  - maps 0xFF to U+009F (see also Wikipedia [1]),
  - maps 0xDF to U+00FF.

> COBOL has a notion of "high-value", which is guaranteed to be the
> "highest" value in a character set.  The reference manual for COBOL
> from IBM states:
> 
>       For alphanumeric data with the EBCDIC collating sequence, 
>       [HIGH-VALUE] is X'FF'.

And what about double-byte EBCDIC?

If you want the highest Unicode code point, you may use U+10FFFF.
It's part of the private-use plane 16.

Note that U+00FF is not the "highest" Unicode code point; it's only
the highest code point in the ISO-8859-1 subset.

> Given IBM's statement, to these innocent eyes it looks like a bug.

A COBOL language specification has zero relevance when it comes to
defining the character set conversion tables. In other words:
Like it or not, but that's how IBM-1140 is defined.

Bruno

[1] https://en.wikipedia.org/wiki/EBCDIC



Reply via email to