On Fri, 20 Sep 2019 17:51:31 +0000, Seymour J Metz wrote: >FSVO "autodetect". Windoze expects to see an extraneous zero-width >non-breaking space as the first character of a file using UTF-8; that of >course, will break any software that is not expecting it. The ida of jamming >in an extraneous character as a byte-order mark when the issue of byte order >has no relevance is something that only the developers of edge (ptui!) could >love. If you have a valid source file for a language whose compiler does not >allow an extraneous initial character, than windoze will not autodetect it as >UTF-8. > It's ironic. In other contexts I've known Windows partisans to sneer at the UNIX tradition of "magic numbers" which the BOM represents.
And it's not probative. A file starting with the BOM octets could as well be ISO8859-x. The detection must be Bayesian/heuristic. A file of significant size containing numerous non-ASCII characters, but only in valid UTF-8 sequences is highly likely to be UTF-8. A file containing only ASCII characters is ambiguous: it doesn't matter whether it's considered ASCII or UTF-8. Hungarian notation is unlikely to be a solution -- programmers will not embrace a filename extension such as ".U8TXT". And MIME metadata is not usually available. >>Does your Windows system support UTF-8? It's prevalent and I've >>known Windows programs to auto-detect it. -- gil ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN