ypo, that should be ISO-8859-15, Latin9.
-- Shmuel (Seymour J.) Metz http://mason.gmu.edu/~smetz3 ________________________________________ From: IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU> on behalf of Paul Gilmartin <0000000433f07816-dmarc-requ...@listserv.ua.edu> Sent: Tuesday, November 17, 2020 10:12 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: FTP converting between UTF-8 and EBCDIC On Tue, 17 Nov 2020 13:29:32 +0000, Seymour J Metz wrote: >NFW; unless the documentation describes such bizarre behavior, it should *NOT* >translate characters to SUB when there is a correct translation. If you want >to preserve the length then use a character set in which all characters are 8 >bits, e.g., ISO-8869=15. "8869="? Pedant! I easily presumed "not in the target character set" as Charles's intent. ________________________________________ From: Charles Mills >Sent: Monday, November 16, 2020 4:14 PM >If you tell FTP that the non-EBCDIC file is UTF-8 then FTP *should* convert >accented characters and such to EBCDIC SUB (X'3F') rather than to two bytes. >Should. YMMV. I understand this to be common practice, as described in: https://secure-web.cisco.com/1CU1T9ZeyF6UHYCkeSdBDfGguunRASoBn4ZgLNTkuMZb59Yo5npGD_H2Pb6G5bijvF5VlGkp1sQk8zyGkV0uupjf4wRqqbivc76xIl5_nMR1p4lJotPuLYnFZzEiKj_QZKq-RH638h74A5UAOGONfx7qRmo5sC7LxMlqYQTvqyvKRBaaVSfyBuGqHwG0LIy3NJ_VpgJTVE4mvnsmHtHdHpV0JsgR26LmlhPskpvOnV_VR2pOpsUyhSxn1G7Yo2-NE78BL--lV-ejaElC25jvV66skTxjmFnz0M30_pGpYOEtC5uXHLr9GQksuagnLZEg3OJJbUo1Ht6_Ju7osBkrBLEvXwbrcqLZGvjqlSQcgIh9eFULW0RFdcDvOoYqiOrMklQdCW0qnAH5EkgOSODNDzkrqxCmJ0vvJrjN2ARIeKry6ECw5kYcS1k6rggdUCv64/https%3A%2F%2Fwww.asciihex.com%2Fcharacter%2Fcontrol%2F26%2F0x1A%2Fsub-substitute (citation with better authority needed) A substitute character (␚) is a control character. It is represented in the place of a character that is identified to be invalid or incorrect, or in cases when it can't be represented on a device used. Besides, it's used in the role if an escape sequence in some programming languages. (Let's not further discuss the CPM/DOS/Windows abuse of 0x1a (SUB).) Alas, IBM too characteristically flaunts common practice: SA23-2280-40 UNIX System Services Command Reference iconv - Convert characters from one code set to another -c Characters containing conversion errors are not written to the output. By default, characters not in the source character set are converted to the value 0xff and written to the output. Does "conversion errors" mean "invalid octet sequences" in the source as well as characters valid in the source CCSID but having no equivalent in the target charact set. In the former case, how many 0xff are written? (is "0xff" possibly a typo for "0x3f"? Truly, if the target character set is ASCII-like the substitute character should be 0x1a; if EBCDIC-like, 0x3f; if DBCS, ???, but not a single octet. SC14-7314-40 XL C/C++ Runtime Library Reference says: iconv() — Code conversion If a sequence of input bytes does not form a valid character in the specified encoded character set, conversion stops after the previous successfully converted character, and iconv() sets errno to EILSEQ. ... If iconv() encounters a character in the input buffer that is valid, but for which a conversion is not defined in the conversion descriptor, cd, then iconv() performs a nonidentical conversion on this character. The conversion is implementation-defined. This document describes the z/OS implementation and should define that conversion. Do iconv and FTP both rely on that behavior? I feel like a couple RCFs -- gil ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN