Pierre Labastie wrote:
On 06/03/2016 21:19, Bruce Dubbs wrote:
The most recent version of grep is causing problems. If it processes a file
that has a character that is not in the LANG, it stops processing and outputs
"Binary file <name> matches".
This problem came up in building lxqt as the .desktop files have a lot of
characters for different languages. I think these are all utf-8, but I'm not
sure.
There are several ways to work around this problem.
1. export GREP_OPTIONS=--text or it's equivalent GREP_OPTIONS=-a
The man pages says
GREP_OPTIONS
This variable specifies default options to be placed in front of any
explicit options. As this causes problems when writing portable scripts, this
feature will be removed in a future release of grep, and grep warns if it is
used. Please use an alias or script instead.
2. I don't think an alias would be a good option as that would not be picked
up in scripts.
alias grep='grep -a'
3. We could create a script like yacc.
cd /bin
mv grep grep.orig
cat >> grep << EOF
#! /bin/sh
exec /bin/grep.orig --text "$@"
EOF
But there are times when we do not want the --text behavior.
4. export LANG=en_US.utf8 where necessary.
The problem here is trying to pick up all the places where it is necessary.
If a user already has LANG set to a value like fr_FR.utf8, I don't think it
would be needed. It also would not solve the problem if there are non-utf8
characters in the file being searched.
I'll note that I have already addressed this in
http://www.linuxfromscratch.org/blfs/view/svn/postlfs/cacerts.html
where I had to add export LANG=en_US.utf8 to /usr/bin/make-ca.sh.
For right now, I'm going to go with 4, but am not totally happy with that
solution.
Feedback appreciated.
I realize that, although I set LANG=fr_FR.UTF-8, jhalfs sets LC_ALL=C (this
can be changed, but I just realize this now). That may be why I got issues
with lxqt .desktop files. OTOH, from man grep:
---------------------
Within a bracket expression, a range expression consists of two characters
separated by a hyphen. It matches any single character that sorts between
the two characters, inclusive, using the locale's collating sequence and
character set. For example, in the default C locale, [a-d] is equivalent
to [abcd]. Many locales sort characters in dictionary order, and in these
locales [a-d] is typically not equivalent to [abcd]; it might be equivalent
to [aBbCcDd], for example. To obtain the traditional interpretation of
bracket expressions, you can use the C locale by setting the LC_ALL
environment variable to the value C.
--------------------
So if a package build system relies on the LC_ALL=C behavior, "4" could lead
to issues...
But I do not have a better alternative to propose.
Hmm. Setting LANG does change LC_COLLATE and all of the LC_ vars except
LC_ALL. In a test, I see that we really only need to change
LC_CTYPE=en_US.utf8 to make grep work properly for what we need.
-- Bruce
--
http://lists.linuxfromscratch.org/listinfo/blfs-dev
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page