On Fri, 20 Sep 2019 17:51:31 +0000, Seymour J Metz wrote:

>FSVO "autodetect". Windoze expects to see an extraneous zero-width 
>non-breaking space as the first character of a file using UTF-8; that of 
>course, will break any software that is not expecting it. The ida of jamming 
>in an extraneous character as a byte-order mark when the issue of byte order 
>has no relevance is something that only the developers of edge (ptui!) could 
>love. If you have a valid source file for a language whose compiler does not 
>allow an extraneous initial character, than windoze will not autodetect it as 
It's ironic.  In other contexts I've known Windows partisans to sneer at
the UNIX tradition of "magic numbers" which the BOM represents.

And it's not probative.  A file starting with the BOM octets could as well
be ISO8859-x.

The detection must be Bayesian/heuristic.  A file of significant size containing
numerous non-ASCII characters, but only in valid UTF-8 sequences is
highly likely to be UTF-8.  A file containing only ASCII characters is
ambiguous: it doesn't matter whether it's considered ASCII or UTF-8.

Hungarian notation is unlikely to be a solution -- programmers will
not embrace a filename extension such as ".U8TXT".  And MIME
metadata is not usually available.

>>Does your Windows system support UTF-8?  It's prevalent and I've
>>known Windows programs to auto-detect it.

-- gil

For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to