On Sat, May 31, 2025 at 10:45:17AM -0000, Stuart Henderson wrote:
> ggrep does in this instance, but I don't know how reliable that is.

I had already forgotten about a problem I encountered with GNU grep
under Linux while writing a shell script to process mbox files long time
ago.  Some of the messages in my mbox files were iso-latin encoded
(Spanish,) since my locales were UTF-8, a grep command in a pipe at the
end of my script printed the message "binary file matches" and removed
from the output any line containing invalid UTF-8 sequences considering
them garbage from a binary file.  This is what still happens under Linux
(\xed is latin-1 iacute):

  $ printf '\xedHello\n' > test
  $ grep Hello test
  grep: test: binary file matches
  $ LANG=C grep Hello test
  �Hello

I mention this as a practical example of the trade-offs of using
wide-character functions.



-- 
Walter

Reply via email to