Re: FTP converting between UTF-8 and EBCDIC

Seymour J Metz Tue, 17 Nov 2020 07:23:10 -0800

Why would you presume that the character is not in the target character set? 
Certainly there are EBCIDIC character sets containing  accented letters, and 
there iare the issues of GE and SI/SO. In either direction encodings of 
characters can legitimately change length.



--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3


________________________________________
From: IBM Mainframe Discussion List <[email protected]> on behalf of 
Paul Gilmartin <[email protected]>
Sent: Tuesday, November 17, 2020 10:12 AM
To: [email protected]
Subject: Re: FTP converting between UTF-8 and EBCDIC

On Tue, 17 Nov 2020 13:29:32 +0000, Seymour J Metz wrote:

>NFW; unless the documentation describes such bizarre behavior, it should *NOT* 
>translate characters to SUB when there is a correct translation. If you want 
>to preserve the length then use a character set in which all characters are 8 
>bits, e.g., ISO-8869=15.

"8869="?

Pedant!  I easily presumed "not in the target character set" as Charles's 
intent.

________________________________________
From:  Charles Mills
>Sent: Monday, November 16, 2020 4:14 PM

>If you tell FTP that the non-EBCDIC file is UTF-8 then FTP *should* convert
>accented characters and such to EBCDIC SUB (X'3F') rather than to two bytes.
>Should. YMMV.

I understand this to be common practice, as described in:
    
https://secure-web.cisco.com/1CU1T9ZeyF6UHYCkeSdBDfGguunRASoBn4ZgLNTkuMZb59Yo5npGD_H2Pb6G5bijvF5VlGkp1sQk8zyGkV0uupjf4wRqqbivc76xIl5_nMR1p4lJotPuLYnFZzEiKj_QZKq-RH638h74A5UAOGONfx7qRmo5sC7LxMlqYQTvqyvKRBaaVSfyBuGqHwG0LIy3NJ_VpgJTVE4mvnsmHtHdHpV0JsgR26LmlhPskpvOnV_VR2pOpsUyhSxn1G7Yo2-NE78BL--lV-ejaElC25jvV66skTxjmFnz0M30_pGpYOEtC5uXHLr9GQksuagnLZEg3OJJbUo1Ht6_Ju7osBkrBLEvXwbrcqLZGvjqlSQcgIh9eFULW0RFdcDvOoYqiOrMklQdCW0qnAH5EkgOSODNDzkrqxCmJ0vvJrjN2ARIeKry6ECw5kYcS1k6rggdUCv64/https%3A%2F%2Fwww.asciihex.com%2Fcharacter%2Fcontrol%2F26%2F0x1A%2Fsub-substitute
(citation with better authority needed)
    A substitute character (␚) is a control character. It is represented in the
    place of a character that is identified to be invalid or incorrect, or in 
cases
    when it can't be represented on a device used. Besides, it's used in the
    role if an escape sequence in some programming languages.

(Let's not further discuss the CPM/DOS/Windows abuse of 0x1a (SUB).)

Alas, IBM too characteristically flaunts common practice:
    SA23-2280-40  UNIX System Services Command Reference
    iconv - Convert characters from one code set to another
-c  Characters containing conversion errors are not written to the output.
    By default, characters not in the source character set are converted
    to the value 0xff and written to the output.

Does "conversion errors" mean "invalid octet sequences" in the source
as well as characters valid in the source CCSID but having no equivalent
in the target charact set.  In the former case, how many 0xff are written?

(is "0xff" possibly a typo for "0x3f"?

Truly, if the target character set is ASCII-like the substitute character
should be 0x1a; if EBCDIC-like, 0x3f; if DBCS, ???, but not a single octet.


    SC14-7314-40  XL C/C++ Runtime Library Reference
says:
    iconv() — Code conversion
    If a sequence of input bytes does not form a valid character in the
    specified encoded character set, conversion stops after the previous
    successfully converted character, and iconv() sets errno to EILSEQ.
    ...
    If iconv() encounters a character in the input buffer that is valid, but
    for which a conversion is not defined in the conversion descriptor, cd,
    then iconv() performs a nonidentical conversion on this character.
    The conversion is implementation-defined.

This document describes the z/OS implementation and should define
that conversion.

Do iconv and FTP both rely on that behavior?

I feel like a couple RCFs

-- gil











----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN



----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: FTP converting between UTF-8 and EBCDIC

Reply via email to