Hi,

I have a locale directory and an older locale-bak
directory in a project's source directory. They both
contain gettext .po files that are all utf8-encoded.

Why might "diff -durpN locale-bak locale" produce a
non-utf8-encoded diff file?

The file(1) command reports all of the .po files like this:

  GNU gettext message catalogue, Unicode text, UTF-8 text

It reports the resulting diff file like this:

  unified diff output, Non-ISO extended-ASCII text, with LF, NEL line 
terminators

This is with diff (GNU diffutils) 3.8 on debian12 and
diff (GNU diffutils) 3.12 on macos-10.14.

Any idea what I'm doing wrong to make this happen?
I would expect the diff output to be utf8-encoded
(and readable in vim).

Hmm, if I leave out the -p option, it works correctly,
and the diff output is utf8-encoded.

Admittedly, I don't need the -p option for gettext .po
files, but I always use the shell alias d='diff -durpN'
and it usually does no harm with non-C files.

Using diff -p on two gettext .po files does produce
valid utf8-encoded output, but diff -rp on two
directories containing gettext .po files doesn't.

I tried again with a single language's translation in
both directories, and it produced correct utf8. The
original directories had 47 language directories each.
So it doesn't always happen. I don't know how many
files it takes for the problem to occur.

In case it's helpful, I've put a temporary copy of the
two locale directories and the resulting diff output at
raf.org/tmp/diffutils-rp-utf8.tar.gz (460K).

cheers,
raf




Reply via email to