On Fri, 20 Sep 2019 17:51:31 +0000, Seymour J Metz wrote:

>FSVO "autodetect". Windoze expects to see an extraneous zero-width 
>non-breaking space as the first character of a file using UTF-8; that of 
>course, will break any software that is not expecting it. The ida of jamming 
>in an extraneous character as a byte-order mark when the issue of byte order 
>has no relevance is something that only the developers of edge (ptui!) could 
>love. If you have a valid source file for a language whose compiler does not 
>allow an extraneous initial character, than windoze will not autodetect it as 
>UTF-8.
>
It's ironic.  In other contexts I've known Windows partisans to sneer at
the UNIX tradition of "magic numbers" which the BOM represents.

And it's not probative.  A file starting with the BOM octets could as well
be ISO8859-x.

The detection must be Bayesian/heuristic.  A file of significant size containing
numerous non-ASCII characters, but only in valid UTF-8 sequences is
highly likely to be UTF-8.  A file containing only ASCII characters is
ambiguous: it doesn't matter whether it's considered ASCII or UTF-8.

Hungarian notation is unlikely to be a solution -- programmers will
not embrace a filename extension such as ".U8TXT".  And MIME
metadata is not usually available.

>>Does your Windows system support UTF-8?  It's prevalent and I've
>>known Windows programs to auto-detect it.

-- gil

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to