The translation files we currently store in Git are full of redundant information: source strings for untranslated messages, and file locations. The first causes unnecessarily huge files. The second makes diffs unreadable: when code is edited and line numbers change, metadata for all messages shows up as changed. This makes reviewing translation patches, and merging possible conflicts, hard -- it requires specialized tools.

This patch changes the Makefile to strip the unneeded data from .po files.

Translators using Git must now run msgmerge (or, `make merge-po`) to get .po files they can work with. Transifex users are unaffected, as the source .pot file is not changed.

The i18n tests use file locations for producing nice error reports¹.
To make this work as before, the .pot is merged in before validation to restore comments. Currently this takes a noticeable amount of time, because polib uses a particularly naïve algorithm for merging. I've sent a patch to polib to resolve this; once that makes it downstream merging will be fast again.

Updating the translations with the new Makefile will cause a >5MB patch. I don't want to pollute the mailing list with it, at least until the Makefile patch is reviewed. It's available

¹ And for divining the programming language messages come from, but that is only done on the .pot file, unaffected by this patch.


From 16b20b737225908311f98e55db0938515e1abad6 Mon Sep 17 00:00:00 2001
From: Petr Viktorin <>
Date: Wed, 20 Jun 2012 06:38:16 -0400
Subject: [PATCH] Arrange stripping .po files

The .po files we use for translations have two shortcomings when used in Git:
- They include file locations, which change each time the source is updated.
  This results in large, unreadable diffs that don't merge well.
- They include source strings for untranslated messages, wasting space

Update the Makefile so that the extraneous information is stripped when the
files are updated or pulled form Transifex, and empty translation files are
removed entirely.
Also, translations are normalized to a common style. This should help diffs
and merges.

The validator requires file location comments to identify the programming
language, and to produce good error reports.
To make this work, merge the comments in before validation.

First patch for:
 install/   |    5 +++++
 install/po/ |   20 +++++++++++++++++---
 install/po/README      |   16 ++++++++++++++--
 tests/          |   12 ++++++++++--
 4 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/install/ b/install/
index 827ddbab411a4aa8abbdd4488e217ce67046bd6b..9e781a684429191b3c5eb46aed4fceecc9be6586 100644
--- a/install/
+++ b/install/
@@ -48,6 +48,11 @@ if test "x$MSGCMP" = "xno"; then
     AC_MSG_ERROR([msgcmp not found, install gettext])
+AC_PATH_PROG(MSGATTRIB, msgattrib, [no])
+if test "x$MSGATTRIB" = "xno"; then
+    AC_MSG_ERROR([msgattrib not found, install gettext])
 AC_PATH_PROG(TX, tx, [/usr/bin/tx])
diff --git a/install/po/ b/install/po/
index 9a3dde78a20a6beb35ab08230331f28b7ea3161d..c1a9bc8b8962fa2f9c7ff2bf541f5996e34a642f 100644
--- a/install/po/
+++ b/install/po/
@@ -14,6 +14,7 @@ MSGFMT = @MSGFMT@
 TX = @TX@
 IPA_TEST_I18N = ../../tests/
@@ -67,25 +68,34 @@ C_POTFILES = $(C_FILES) $(H_FILES)
 .SUFFIXES: .po .mo
-.PHONY: all create-po update-po update-pot install mostlyclean clean distclean test mo-files debug
+.PHONY: all create-po update-po update-pot install mostlyclean clean distclean test mo-files debug strip-po merge-po
 SUFFIXES = .po .mo
 	@echo Creating $@; \
 	$(MSGFMT) -c -o t-$@ $< && mv t-$@ $@
-$(po_files): $(DOMAIN).pot
+$(po_files): update-pot
 	@if [ ! -f $@ ]; then \
 	    lang=`echo $@ | $(SED) -r -e 's/\.po$$//'` # Strip .po suffix ; \
 	    echo Creating nonexistent $@, you should add this file to your SCM repository; \
 	    $(MSGINIT) --locale $$lang --no-translator -i $(DOMAIN).pot -o $@; \
 	fi; \
 	echo Merging $(DOMAIN).pot into $@; \
 	$(MSGMERGE) --no-fuzzy-matching -o $@ $@ $(DOMAIN).pot
+	@for po_file in $$(ls *.po); do \
+		echo Stripping $$po_file; \
+		$(MSGATTRIB) --translated --no-fuzzy --no-location $$po_file > $$po_file.tmp; \
+		mv $$po_file.tmp $$po_file; \
+	done
+	@echo Remove empty translation files; \
+	find . -name '*.po' -empty -exec rm -v {} \;
 create-po: $(DOMAIN).pot
 	@for po_file in $(po_files); do \
 	    if [ ! -e $$po_file ]; then \
@@ -98,10 +108,14 @@ create-po: $(DOMAIN).pot
 	cd ../..; $(TX) pull -f
+	$(MAKE) strip-po
-update-po: update-pot
+merge-po: update-pot
 	$(MAKE) $(po_files)
+update-po: merge-po
+	$(MAKE) strip-po
 	@rm -f $(DOMAIN).pot.update
 	@pushd ../.. ; \
diff --git a/install/po/README b/install/po/README
index ada7df40e3f294b204a5d44c267ee57ebe734042..6894a06337fac68675cb1a852ca828c54da74f96 100644
--- a/install/po/README
+++ b/install/po/README
@@ -6,28 +6,40 @@ A: Edit and add the source file to the appropriate *_POTFILES list.
    NOTE: Now this i only necessary for python files that lack the .py
          extension. All .py, .c and .h files are automatically sourced.
+Q: Untranslated strings and file locations are missing from my .po file.
+   How do I add them?
+A: make merge-po
+   Untranslated strings are left out of the files in SCM. The merge-po command
+   runs msgmerge to add them again.
 Q: How do I pick up new strings to translate from the source files after the
    source have been modified?
-A: make update-po
+A: make merge-po
    This regenerates the pot template file by scanning all the source files.
    Then the new strings are merged into each .po file from the new pot file.
 Q: How do I just regenerate the pot template file without regenerating all the
    .po files?
 A: make update-pot
+Q: I am done translating. How do I commit my changes?
+A: Run `make strip-po` to remove unneeded information from the po files, then
+   add your changes to SCM.
 Q: How do I add a new language for translation?
 A: Edit the LINGUAS file and add the new language. Then run "make create-po".
    This will generate a new .po file for each language which doesn't have one
    yet. Be sure to add the new .po file(s) to the source code repository.  For
    certain languages, you may have to edit the Plurals line.  See:
    However, if this line is wrong, it is often an indicator that the locale
    value is incorrect.  For example, using 'jp' for Japanese in stead of 'ja'
-   will result in an invailid Plural's line.
+   will result in an invalid Plurals line.
 Q: What files must be under source code control?
diff --git a/tests/ b/tests/
index 703dc8bbb0962612fdd29f35c7fde06ffcd58406..9c8479bb0a7b2a32d413a58fb5b052afa2866f35 100755
--- a/tests/
+++ b/tests/
@@ -367,7 +367,7 @@ def validate_positional_substitutions(s, prog_langs, s_name='string'):
     return errors
-def validate_file(file_path, validation_mode):
+def validate_file(file_path, validation_mode, reference_pot=None):
     Given a pot or po file scan all it's entries looking for problems
     with variable substitutions. See the following functions for
@@ -378,6 +378,9 @@ def validate_file(file_path, validation_mode):
     * validate_positional_substitutions()
     Returns the number of entries with errors.
+    For po files, ``reference_pot`` gives a pot file to merge with (to recover
+    comments and file locations)
     def emit_messages():
@@ -419,6 +422,9 @@ def emit_messages():
         return Result(n_entries=n_entries, n_msgids=n_msgids, n_msgstrs=n_msgstrs, n_warnings=n_warnings, n_errors=n_errors)
+    if validation_mode == 'po' and reference_pot:
+        # Merge the .pot file for comments and file locations
+        po.merge(reference_pot)
     if validation_mode == 'po':
         plural_forms = po.metadata.get('Plural-Forms')
@@ -754,12 +760,14 @@ def main():
             if not files:
                 files = [options.pot_file]
             validation_mode = 'pot'
+            reference_pot = None
         elif options.mode == 'validate_po':
             files = args
             if not files:
                 print >> sys.stderr, 'ERROR: no po files specified'
                 return 1
             validation_mode = 'po'
+            reference_pot = polib.pofile(options.pot_file)
             print >> sys.stderr, 'ERROR: unknown validation mode "%s"' % (options.mode)
             return 1
@@ -771,7 +779,7 @@ def main():
         total_errors = 0
         for f in files:
-            result = validate_file(f, validation_mode)
+            result = validate_file(f, validation_mode, reference_pot)
             total_entries += result.n_entries
             total_msgids += result.n_msgids
             total_msgstrs += result.n_msgstrs

