On Sun, Mar 06, 2016 at 10:38:41PM +0100, Pierre Labastie wrote:
> On 06/03/2016 21:19, Bruce Dubbs wrote:
> > The most recent version of grep is causing problems.  If it processes a file
> > that has a character that is not in the LANG, it stops processing and 
> > outputs
> > "Binary file <name> matches".
> > 
> > This problem came up in building lxqt as the .desktop files have a lot of
> > characters for different languages.  I think these are all utf-8, but I'm 
> > not
> > sure.
> > 
> > There are several ways to work around this problem.
> > 
> > 1.  export GREP_OPTIONS=--text or it's equivalent GREP_OPTIONS=-a
> > 
> > The man pages says
> > 
> > GREP_OPTIONS
> >    This variable specifies default options to be placed in front of any
> > explicit options. As this causes problems when writing portable scripts, 
> > this
> > feature will be removed in a future release of grep, and grep warns if it is
> > used. Please use an alias or script instead.
> > 
> > 2. I don't think an alias would be a good option as that would not be picked
> > up in scripts.
> > 
> > alias grep='grep -a'
> > 
> > 3.  We could create a script like yacc.
> > 
> > cd /bin
> > mv grep grep.orig
> > cat >> grep << EOF
> > #! /bin/sh
> > exec /bin/grep.orig --text "$@"
> > EOF
> > 
> > But there are times when we do not want the --text behavior.
> > 
> > 4.  export LANG=en_US.utf8 where necessary.
> > 
> > The problem here is trying to pick up all the places where it is necessary. 
> > If a user already has LANG set to a value like fr_FR.utf8, I don't think it
> > would be needed.  It also would not solve the problem if there are non-utf8
> > characters in the file being searched.
> > 
> > I'll note that I have already addressed this in
> > 
> >   http://www.linuxfromscratch.org/blfs/view/svn/postlfs/cacerts.html
> > 
> > where I had to add export LANG=en_US.utf8 to /usr/bin/make-ca.sh.
> > 
> > For right now, I'm going to go with 4, but am not totally happy with that
> > solution.
> > 
> > Feedback appreciated.
> > 
> 
> I realize that, although I set LANG=fr_FR.UTF-8, jhalfs sets LC_ALL=C (this
> can be changed, but I just realize this now). That may be why I got issues
> with lxqt .desktop files. OTOH, from man grep:
> ---------------------
> Within  a bracket expression, a range expression consists of two characters
> separated by a hyphen.  It matches any single character that sorts  between
> the  two  characters,  inclusive, using the locale's collating sequence and
> character set.  For example, in the default C locale, [a-d]  is  equivalent
> to  [abcd]. Many locales sort characters in dictionary order, and in these
> locales [a-d] is typically not equivalent to [abcd]; it might be equivalent
> to  [aBbCcDd],  for  example.   To obtain the traditional interpretation of
> bracket expressions, you can  use  the  C  locale  by  setting  the  LC_ALL
> environment variable to the value C.
> --------------------
> So if a package build system relies on the LC_ALL=C behavior, "4" could lead
> to issues...
> 
> But I do not have a better alternative to propose.
> 
As a general rule, I think 4 is probably the best option - perhaps
we could have a copy member to explain it, and in that we could
mention that any installed UTF-8 locale should be fine ?

ĸen
-- 
This email was written using 100% recycled letters.
-- 
http://lists.linuxfromscratch.org/listinfo/blfs-dev
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page

Reply via email to