Hi, I have a locale directory and an older locale-bak directory in a project's source directory. They both contain gettext .po files that are all utf8-encoded.
Why might "diff -durpN locale-bak locale" produce a non-utf8-encoded diff file? The file(1) command reports all of the .po files like this: GNU gettext message catalogue, Unicode text, UTF-8 text It reports the resulting diff file like this: unified diff output, Non-ISO extended-ASCII text, with LF, NEL line terminators This is with diff (GNU diffutils) 3.8 on debian12 and diff (GNU diffutils) 3.12 on macos-10.14. Any idea what I'm doing wrong to make this happen? I would expect the diff output to be utf8-encoded (and readable in vim). Hmm, if I leave out the -p option, it works correctly, and the diff output is utf8-encoded. Admittedly, I don't need the -p option for gettext .po files, but I always use the shell alias d='diff -durpN' and it usually does no harm with non-C files. Using diff -p on two gettext .po files does produce valid utf8-encoded output, but diff -rp on two directories containing gettext .po files doesn't. I tried again with a single language's translation in both directories, and it produced correct utf8. The original directories had 47 language directories each. So it doesn't always happen. I don't know how many files it takes for the problem to occur. In case it's helpful, I've put a temporary copy of the two locale directories and the resulting diff output at raf.org/tmp/diffutils-rp-utf8.tar.gz (460K). cheers, raf
