Maintainers,

I recently was confused as to why GNU grep did not find any matches in
certain files, when vim clearly showed that the search string was present.

Turns out the files (log files from a Windows application) are encoded in
UTF16-LE.

$ file '06-21-2023 03-22-46'
> 06-21-2023 03-22-46: Unicode text, UTF-16, little-endian text, with CRLF
> line terminators


> $ /bin/od -Ad -w16 -t cz '06-21-2023 03-22-46' | head -10
> 0000000 377 376   [  \0   H  \0   E  \0   A  \0   D  \0   E  \0   R  \0
>  >..[.H.E.A.D.E.R.<
> 0000016   :  \0   ]  \0  \r  \0  \n  \0   [  \0   I  \0   D  \0   r  \0
>  >:.].....[.I.D.r.<
> 0000032   i  \0   v  \0   e  \0      \0   v  \0   e  \0   r  \0   s  \0
>  >i.v.e. .v.e.r.s.<
> 0000048   i  \0   o  \0   n  \0   :  \0      \0   6  \0   .  \0   7  \0
>  >i.o.n.:. .6...7.<
> 0000064   .  \0   4  \0   .  \0   4  \0   6  \0      \0   R  \0   e  \0
>  >..4...4.6. .R.e.<
> 0000080   l  \0   e  \0   a  \0   s  \0   e  \0      \0   D  \0   a  \0
>  >l.e.a.s.e. .D.a.<
> 0000096   t  \0   e  \0   :  \0      \0   0  \0   6  \0   /  \0   1  \0
>  >t.e.:. .0.6./.1.<
> 0000112   6  \0   /  \0   2  \0   0  \0   2  \0   3  \0   ]  \0  \r  \0
>  >6./.2.0.2.3.]...<
> 0000128  \n  \0   [  \0   I  \0   n  \0   t  \0   e  \0   r  \0   a  \0
>  >..[.I.n.t.e.r.a.<
> 0000144   c  \0   t  \0   i  \0   v  \0   e  \0      \0   B  \0   a  \0
>  >c.t.i.v.e. .B.a.<


> $ grep --version
> grep (GNU grep) 3.11
> Packaged by Cygwin (3.11-1)
> Copyright (C) 2023 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <
> https://gnu.org/licenses/gpl.html>.
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.


>
There is no easy way to use grep to search these files, even if one knows
the encoding in advance.

I would like to request a feature to be added to grep which would enable it
to transparently decode UTF16-LE files so they can be conveniently searched.

Thanks,
Jeremy Hetzler
(he/him)

Reply via email to