On 2025-09-08 00:13, Bruno Haible wrote:
Paul Eggert wrote:
I meant to not worry about platforms where the "C" (not "C.utf8") locale
is multibyte. I don't know of how diffutils would misbehave in such
locales (other than not be strictly POSIX-conforming in unusual cases
where native tools aren't either), so I wanted Gnulib to not worry about
the possibility.

This possibility actually occurs on Android ≥ 5.0.

Yes, and if it causes a real problem in diffutils we should fix that as it comes up. I don't offhand know why it'd be a real problem.

  * hard_locale_LC_TIME assumes that no locale, not even the en_US locale,
    uses the same internal format string for "%c" as the C locale.

The assumption is a bit different: it's merely that only in the POSIX locale does "%c" produce that output for that particular date. Even if the other locale uses the same internal format string "%a %b %e %T %Y", in a non-English locale if it's quite likely they won't match POSIX's English-language abbreviations.

I don't know of any platform where hard_locale_LC_TIME incorrectly returns false. However, even if it does, diffutils' behavior will still be OK: it'll conform to POSIX and users will surely understand the output. And if a user complains about this extremely minor glitch I assume we can fix that as it comes up.

>        - the translator will not translate "(C)" by "(C)",
>        - the user does not use LANGUAGE with a precedence list.

Not quite following, but it's OK if in unusual cases the program outputs "(C)" when "©" would be better, so long as in ordinary cases "©" is output when it works, and so long as "©" is not output when it would display as gibberish.

   * hard_locale_LC_MESSAGES assumes that
       - diffutils.pot contains the strings from lib/version-etc.c
         (which are now actually in gnulib.pot),

Yes, that's a problem, and thanks for mentioning it. It stems from quite a comedy of errors:

(a) diffutils' en translation is not installed.

(b) cmp looks in the wrong catalog for the "(C)" message.

(c) The gnulib.pot/gnulib.mo mechanism is not yet working widely even for packages other than diffutils. On current Fedora 42 if I run this shell command:

  LC_ALL=en_US.utf8 cat --version

although I see "Torbjörn" in UTF-8 as desired, I also see "Copyright (C) 2025" which is wrong: it should be "Copyright © 2025". Worse, I see the exact same English message when I run this shell command:

  LC_ALL=fr_FR.utf8 cat --version

This is because even though /usr/share/locale/fr/LC_MESSAGES/coreutils.mo is installed, there is no file /usr/share/locale/fr/LC_MESSAGES/gnulib.mo, and Fedora does not supply a gnulib.mo file in any package that I can see. I reported this newish bug to Fedora yesterday <https://bugzilla.redhat.com/show_bug.cgi?id=2393892>.

(d) In response to that Fedora bug report, Lukáš Zaoral set in motion a fix. But he asked, "Since gnulib is meant to be bundled, how do you deal with the situation when the messages in the sources of the bundled gnulib and gnulib-i10n differ? Do you have some upstream policy to make sure that they don't diverge?" Do we have an answer for that? I'm not sure myself.

(e) Even if Fedora started installing a gnulib.mo file, diffutils "make install" does not install such a file, so a standalone build of diffutils with './configure --prefix' would not work since it does not install gnulib.mo.


Given all this configuration mess, for now I took the following conservative approach in Diffutils.

(0) Update diffutils to use need-formatstring-macros when calling AM_GNU_GETTEXT. I discovered this issue while looking into the other problems. Perhaps need-formatstring-macros should be the only behavior nowadays? It hardly seems worth the hassle about worrying about older gettext versions.

(1) Change cmp's hard_locale_LC_MESSAGES to test via setlocale, not via gettext. setlocale should work fine if ENABLE_NLS is nonzero in Diffutils.

(2) Remove diffutils' po/en.po file. It is an unused revenant.

(3) Stick with the longstanding approach of having the Diffutils message catalog translate all messages, including those taken from Gnulib. This has worked for decades, translators are used to it, and the Gnulib part of the catalog hardly ever changes.

(4) Modify Gnulib to let Diffutils override the textdomain that Gnulib uses. Done via Gnulib commit <https://cgit.git.savannah.gnu.org/cgit/gnulib.git/commit/?id=2b2bcdbc3bf3de2838a4b5051e32366e9a94f1e3>.

(5) Use this new Gnulib feature in Diffutils.

I installed the attached patches to Diffutils to do this.

An alternative to (4) and (5) would be to let config.h specify the "_" macro, and have Gnulib define this macro only if it is not already defined. This would make for slightly smaller executables. However, it would be brittler and more intrusive. Or perhaps you can think of a better way to do what is wanted in (3).
From cf5648869a98020cce63755b760da354ad166765 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Tue, 9 Sep 2025 09:12:11 -0700
Subject: [PATCH 1/5] maint: use need-formatstring-macros

* configure.ac: Pass need-formatstring-macros, not merely
need-ngettext, to AM_GNU_GETTEXT.  This is mostly for show, as
diffutils has used format string macros for years and since nobody
uses ancient gettext any more nobody has noticed a problem.
---
 configure.ac | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index aec74ce..23aa615 100644
--- a/configure.ac
+++ b/configure.ac
@@ -212,7 +212,7 @@ test -f $srcdir/.tarball-version \
   && SRC_VERSION_C= \
   || SRC_VERSION_C=../src/version.c
 
-AM_GNU_GETTEXT([external], [need-ngettext])
+AM_GNU_GETTEXT([external], [need-formatstring-macros])
 AM_GNU_GETTEXT_VERSION([0.19.2])
 XGETTEXT="AWK='$AWK' \$(SHELL) \$(top_srcdir)/exgettext $XGETTEXT"
 
-- 
2.48.1

From dc6dc9147f9a5ce87514d0d035e44ab515873019 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Tue, 9 Sep 2025 09:16:16 -0700
Subject: [PATCH 2/5] cmp: improve LC_MESSAGES test

* src/cmp.c (hard_locale_LC_MESSAGES): Use setlocale, not gettext,
to decide whether the messages might not be those of the C or
POSIX locale.  This is a more reliable way to test whether
the locale is something like en_US.utf8, a locale that does
not have a translation catalog but is not the C locale.
---
 src/cmp.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/cmp.c b/src/cmp.c
index 0c4c80c..b7bb204 100644
--- a/src/cmp.c
+++ b/src/cmp.c
@@ -45,12 +45,15 @@ static char const PROGRAM_NAME[] = "cmp";
   proper_name_lite ("Torbjorn Granlund", "Torbj\303\266rn Granlund"), \
   _("David MacKenzie")
 
+/* Return true if the locale's messages might not be those of C or POSIX.  */
 static bool
 hard_locale_LC_MESSAGES (void)
 {
-#if defined LC_MESSAGES && ENABLE_NLS
-  static char const copyright_string[] = "(C)";
-  return gettext (copyright_string) != copyright_string;
+#if ENABLE_NLS
+  /* GNU diff defines ENABLE_NLS only if gettext is preinstalled, and
+     on these platforms setlocale (LC_MESSAGES, nullptr) never returns nullptr
+     and always returns "C" when in the C or POSIX locales.  */
+  return !STREQ (setlocale (LC_MESSAGES, nullptr), "C");
 #else
   return false;
 #endif
-- 
2.48.1

From 096a3b29b535847f7dcf0dfbc70501b1e2dc4ce2 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Tue, 9 Sep 2025 09:22:21 -0700
Subject: [PATCH 3/5] maint: remove po/en.po
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* po/en.po: Remove.  It wasn’t being used, and we’re now doing its
intent in a less-hassly way.
---
 po/en.po | 30 ------------------------------
 1 file changed, 30 deletions(-)
 delete mode 100644 po/en.po

diff --git a/po/en.po b/po/en.po
deleted file mode 100644
index 4fbc71c..0000000
--- a/po/en.po
+++ /dev/null
@@ -1,30 +0,0 @@
-# English messages for GNU diffutils
-# Copyright 1998, 2001-2003, 2009-2013, 2015-2025 Free Software Foundation,
-# Inc.
-# Paul Eggert <egg...@twinsun.com>, 1998
-#
-msgid ""
-msgstr ""
-"Project-Id-Version: GNU diffutils 3.2\n"
-"POT-Creation-Date: 2002-06-16 23:44-0700\n"
-"PO-Revision-Date: 2012-01-25 23:11-0700\n"
-"Last-Translator: Paul Eggert <egg...@cs.ucla.edu>\n"
-"Language-Team: English <e...@translate.freefriends.org>\n"
-"MIME-Version: 1.0\n"
-"Content-Type: text/plain; charset=UTF-8\n"
-"Content-Transfer-Encoding: 8bit\n"
-
-#. TRANSLATORS: Please translate "(C)" to the C-in-a-circle symbol
-#. (U+00A9, COPYRIGHT SIGN) if possible, as this has some minor
-#. technical advantages in international copyright law.  If the
-#. copyright symbol is not available, please leave it as "(C)".
-#: lib/version-etc.c:50
-msgid "(C)"
-msgstr "©"
-
-#. TRANSLATORS: Please translate the second "o" in "Torbjorn Granlund"
-#. to an o-with-umlaut (U+00F6, LATIN SMALL LETTER O WITH DIAERESIS)
-#. if possible.
-#: src/cmp.c:47
-msgid "Written by Torbjorn Granlund and David MacKenzie."
-msgstr "Written by Torbjörn Granlund and David MacKenzie."
-- 
2.48.1

From 3ccfcd8cd77b819ee6a47670b39153451da2844e Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Tue, 9 Sep 2025 10:00:00 -0700
Subject: [PATCH 4/5] build: update gnulib submodule to latest

---
 gnulib | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gnulib b/gnulib
index 62e51f9..2b2bcdb 160000
--- a/gnulib
+++ b/gnulib
@@ -1 +1 @@
-Subproject commit 62e51f91499a7844a9de9ee7553a79a93c80ce37
+Subproject commit 2b2bcdbc3bf3de2838a4b5051e32366e9a94f1e3
-- 
2.48.1

From 6b9c72607683bcd73d4c5832cf9e94ae979572f0 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Tue, 9 Sep 2025 10:03:27 -0700
Subject: [PATCH 5/5] maint: use our textdomain for Gnulib
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Support diffutils’ traditional way of getting translations,
by telling Gnulib to use diffutils’ message catalog.
* configure.ac (GNULIB_TEXT_DOMAIN): New macro.
* src/cmp.c, src/diff.c, src/diff3.c, src/sdiff.c (main):
Don’t call bindtextdomain ("gnulib", GNULIB_LOCALEDIR)
as the existing bindtextdomain (PACKAGE, LOCALEDIR) call suffices.
---
 configure.ac | 4 ++++
 src/cmp.c    | 1 -
 src/diff.c   | 1 -
 src/diff3.c  | 1 -
 src/sdiff.c  | 1 -
 5 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/configure.ac b/configure.ac
index 23aa615..19bc5ef 100644
--- a/configure.ac
+++ b/configure.ac
@@ -76,6 +76,10 @@ AC_DEFINE([REQUIRE_GNUISH_STRFTIME_AM_PM], [0],
 AC_DEFINE([SUPPORT_NON_GREG_CALENDARS_IN_STRFTIME], [false],
   [Do not worry about GNU strftime behavior for non-Gregorian calendars.])
 
+# Diffutils translates Gnulib's msgids too.
+AC_DEFINE([GNULIB_TEXT_DOMAIN], [PACKAGE],
+  [Textdomain to use when translating Gnulib's msgids.])
+
 AC_C_INLINE
 
 AC_CHECK_MEMBERS([struct stat.st_rdev])
diff --git a/src/cmp.c b/src/cmp.c
index b7bb204..84d0e8f 100644
--- a/src/cmp.c
+++ b/src/cmp.c
@@ -211,7 +211,6 @@ main (int argc, char **argv)
   set_program_name (argv[0]);
   setlocale (LC_ALL, "");
   bindtextdomain (PACKAGE, LOCALEDIR);
-  bindtextdomain ("gnulib", GNULIB_LOCALEDIR);
   textdomain (PACKAGE);
   c_stack_action (nullptr);
   xstdopen ();
diff --git a/src/diff.c b/src/diff.c
index 8fd2e1a..775f8fc 100644
--- a/src/diff.c
+++ b/src/diff.c
@@ -318,7 +318,6 @@ main (int argc, char **argv)
   set_program_name (argv[0]);
   setlocale (LC_ALL, "");
   bindtextdomain (PACKAGE, LOCALEDIR);
-  bindtextdomain ("gnulib", GNULIB_LOCALEDIR);
   textdomain (PACKAGE);
   c_stack_action (nullptr);
   function_regexp_list.buf = &function_regexp;
diff --git a/src/diff3.c b/src/diff3.c
index 7ac046a..82f3237 100644
--- a/src/diff3.c
+++ b/src/diff3.c
@@ -233,7 +233,6 @@ main (int argc, char **argv)
   set_program_name (argv[0]);
   setlocale (LC_ALL, "");
   bindtextdomain (PACKAGE, LOCALEDIR);
-  bindtextdomain ("gnulib", GNULIB_LOCALEDIR);
   textdomain (PACKAGE);
   c_stack_action (nullptr);
   xstdopen ();
diff --git a/src/sdiff.c b/src/sdiff.c
index a743517..a21b081 100644
--- a/src/sdiff.c
+++ b/src/sdiff.c
@@ -449,7 +449,6 @@ main (int argc, char *argv[])
   set_program_name (argv[0]);
   setlocale (LC_ALL, "");
   bindtextdomain (PACKAGE, LOCALEDIR);
-  bindtextdomain ("gnulib", GNULIB_LOCALEDIR);
   textdomain (PACKAGE);
   c_stack_action (cleanup);
   xstdopen ();
-- 
2.48.1

Reply via email to