On 02/02/2018 03:30 PM, L A Walsh wrote:
most computer files (vs. user-files) are still single-byte.


That's because so many of them are ASCII. But ASCII files are not the issue here. grep's behavior hasn't changed when operating on ASCII files in typical locales. The issue is text using a non-ASCII encoding that is not compatible with your locale; e.g., if your text file uses ISO 8859-1 but your locale specifies UTF-8.

In my experience, UTF-8 has long been winning this battle, in the sense that UTF-8 is by far the dominant encoding for the non-ASCII files I regularly use. So I use a UTF-8 locale, and suggest this as a good default for most users nowadays.

It's not possible to get direct statistics about encoding for all user files. However, we can see what's being published on the web. Currently UTF-8 is being used by about 90% of public websites whose character encoding can be determined, according to the latest W3Techs survey. ISO 8859-1 is in second place, at about 4%. See:

https://w3techs.com/technologies/overview/character_encoding/all




Reply via email to