Dear all,
I'd like to draw your attention today to
RFC 2640, Internationalization of the File Transfer Protocol. B. Curtin.
July 1999. (Format: TXT=57204 bytes) (Updates RFC0959) (Status:
PROPOSED STANDARD)
The original FTP standard restricted path names of transferred files to
7-bit ASCII. RFC 2640 extends that to UTF-8, such that non-ASCII
pathnames can now be used in a standardized interoperable way.
This RFC does *not* address an also possible extension of the text mode
file transfer, which was introduced originally to provide for automatic
ASCII<->EBCDIC or CRLF<->LF conversion and could be extended to generic
any-to-any charset conversion via UTF-8, like Kermit offers already. So
FTP transmitted files remain encoded in some user-defined format, just
the filenames are required to be in UTF-8 now.
I think that this standard is yet another reasons, why under Linux no
encodings other than ASCII or UTF-8 should ever be used in path names.
It might be useful to add to the "find" command an option that finds
path names with malformed UTF-8 sequences, such that ftp archives can
quickly and conveniently be scanned for conformance.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/