bug#30326: grep not searching through a text file (thinking it binary)

Paul Eggert Fri, 02 Feb 2018 15:45:26 -0800

On 02/02/2018 03:30 PM, L A Walsh wrote:

most computer files (vs. user-files) are still single-byte.

That's because so many of them are ASCII. But ASCII files are not theissue here. grep's behavior hasn't changed when operating on ASCII filesin typical locales. The issue is text using a non-ASCII encoding that isnot compatible with your locale; e.g., if your text file uses ISO 8859-1but your locale specifies UTF-8.

In my experience, UTF-8 has long been winning this battle, in the sensethat UTF-8 is by far the dominant encoding for the non-ASCII files Iregularly use. So I use a UTF-8 locale, and suggest this as a gooddefault for most users nowadays.

It's not possible to get direct statistics about encoding for all userfiles. However, we can see what's being published on the web. CurrentlyUTF-8 is being used by about 90% of public websites whose characterencoding can be determined, according to the latest W3Techs survey. ISO8859-1 is in second place, at about 4%. See:


https://w3techs.com/technologies/overview/character_encoding/all

bug#30326: grep not searching through a text file (thinking it binary)

Reply via email to