On Fri, Dec 5, 2014 at 1:58 AM, Thomas Wolff <t...@computer.org> wrote: > Paul Eggert wrote: >>> >>> the mentioned patches are apparently intended to fix issues in non-UTF-8 >>> locales. >> >> No, they're also needed for UTF-8 locales I'm afraid. There are some >> security issues, not only having to do with grep's internals, but also for >> the behavior of downstream programs that may be expecting UTF-8 text. >> >> You can work around the problem with 'grep -a'. > > I was aware of this workaround but I claim it should not be needed because > the files affected are in fact not binary files but text files. The manual > clearly says about -a: "Process a binary file as if it were text" but > partial content in a different text encoding does not make a file binary. > > Jim Meyering wrote: >> >> this is due to documented and desirable behavior. > > I deny this is desirable behavior and I doubt there is a security issue as > described. If any other, independent software has a security issue with > non-UTF-8 input, it should decide itself to filter it and use accordingly > stable decoding functions. It cannot be the task of any tool (grep in this > case) to filter output to work around possible security issues in other > programs in a pipe. This would be completely against the concept of pipes in > the Unix tradition.
This is another side effect of using a multibyte locale. As long as there are no NUL bytes in your input, you can work around the issue by running grep in the C locale: LC_ALL=C grep ...