bug#19242: latest grep considers text files as binary

Jim Meyering Fri, 05 Dec 2014 07:01:36 -0800

On Fri, Dec 5, 2014 at 1:58 AM, Thomas Wolff <t...@computer.org> wrote:
> Paul Eggert wrote:
>>>
>>> the mentioned patches are apparently intended to fix issues in non-UTF-8
>>> locales.
>>
>> No, they're also needed for UTF-8 locales I'm afraid.  There are some
>> security issues, not only having to do with grep's internals, but also for
>> the behavior of downstream programs that may be expecting UTF-8 text.
>>
>> You can work around the problem with 'grep -a'.
>
> I was aware of this workaround but I claim it should not be needed because
> the files affected are in fact not binary files but text files. The manual
> clearly says about -a: "Process a binary file as if it were text" but
> partial content in a different text encoding does not make a file binary.
>
> Jim Meyering wrote:
>>
>>   this is due to documented and desirable behavior.
>
> I deny this is desirable behavior and I doubt there is a security issue as
> described. If any other, independent software has a security issue with
> non-UTF-8 input, it should decide itself to filter it and use accordingly
> stable decoding functions. It cannot be the task of any tool (grep in this
> case) to filter output to work around possible security issues in other
> programs in a pipe. This would be completely against the concept of pipes in
> the Unix tradition.


This is another side effect of using a multibyte locale.
As long as there are no NUL bytes in your input, you can work
around the issue by running grep in the C locale:

  LC_ALL=C grep ...

bug#19242: latest grep considers text files as binary

Reply via email to