On 02/02/2018 03:30 PM, L A Walsh wrote:
most computer files (vs. user-files) are still single-byte.
That's because so many of them are ASCII. But ASCII files are not the
issue here. grep's behavior hasn't changed when operating on ASCII files
in typical locales. The issue is text using a non-ASCII encoding that is
not compatible with your locale; e.g., if your text file uses ISO 8859-1
but your locale specifies UTF-8.
In my experience, UTF-8 has long been winning this battle, in the sense
that UTF-8 is by far the dominant encoding for the non-ASCII files I
regularly use. So I use a UTF-8 locale, and suggest this as a good
default for most users nowadays.
It's not possible to get direct statistics about encoding for all user
files. However, we can see what's being published on the web. Currently
UTF-8 is being used by about 90% of public websites whose character
encoding can be determined, according to the latest W3Techs survey. ISO
8859-1 is in second place, at about 4%. See: