Jeroen T. Vermeulen has proposed merging lp:~jtv/launchpad/bug-722568 into 
lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  #722568 Empty TranslationMessages confuse stats
  https://bugs.launchpad.net/bugs/722568

For more details, see:
https://code.launchpad.net/~jtv/launchpad/bug-722568/+merge/50593

= Summary =

Translation statistics are off.  That's because a translatable message can have 
an empty translation, which is different in database terms from it having no 
translation at all but still means it should count as untranslated.

The query that collects translation statistics for POFiles included messages 
with empty translations, although they were correctly noted as untranslated on 
the "other side" (the two "sides" of translation being Ubuntu and upstream).  
In the query that counts and classifies translated messages, this gave rise to 
the nonsensical category of translated messages that were not translated on the 
other side, yet were translated identically between the two sides.


== Proposed fix ==

Exclude empty translations from the query that counts translated messages.


== Pre-implementation notes ==

Chatted this over with the rest of the former Translations team.  At first it 
looked as if the problem was in the ancient, highly confused, and badly 
documented field of statistics interpretation.  This has always been messy, but 
also underwent deep changes with the introduction of message sharing, and 
later, Ubuntu/upstream message sharing (the Recife model).


== Implementation details ==

The statistics code already generated a string to check for empty 
TranslationMessages, but on the "other side" only.  All I had to do was extract 
it into a function, re-use it for the message on "this side," and stick the 
result in the WHERE clause as an extra condition.

I also updated the rosettastats docstrings, since they were vastly out of date.


== Tests ==

The tests on statistics are many and varied, making it a shame that this 
problem wasn't caught before.  You can exercise them with:
{{{
./bin/test -vvc lp.translations.tests.test_pofile
}}}

Of the new tests I added there, one was designed to—and did—pass without the 
fix: test_empty_messages_on_other_side_count_as_untranslated.  That's because 
of the existing clause that I re-used in this branch.


== Demo and Q/A ==

Run cronscripts/rosetta-pofile-stats.py.  This will take a long time.  Then 
verify POFile statistics by comparing the messages returned by the Translated 
and Untranslated translation filters to the numbers reported in the statistics.

Notably, check the zh_CN translation for WiserEarth trunk: 
https://translations.launchpad.net/wiserearth/trunk


= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/translations/tests/test_pofile.py
  lib/lp/translations/model/pofile.py
-- 
https://code.launchpad.net/~jtv/launchpad/bug-722568/+merge/50593
Your team Launchpad code reviewers is requested to review the proposed merge of 
lp:~jtv/launchpad/bug-722568 into lp:launchpad.
=== modified file 'lib/lp/translations/model/pofile.py'
--- lib/lp/translations/model/pofile.py	2011-01-26 23:33:20 +0000
+++ lib/lp/translations/model/pofile.py	2011-02-21 13:46:28 +0000
@@ -1,4 +1,4 @@
-# Copyright 2009-2010 Canonical Ltd.  This software is licensed under the
+# Copyright 2009-2011 Canonical Ltd.  This software is licensed under the
 # GNU Affero General Public License version 3 (see the file LICENSE).
 
 # pylint: disable-msg=E0611,W0212,W0231
@@ -116,6 +116,17 @@
     )
 
 
+def compose_sql_translationmessage_has_translations(tm_sql_identifier):
+    """Compose SQL for "`TranslationMessage` is nonempty.".
+
+    :param tm_sql_identifier: The SQL identifier for the
+        `TranslationMessage` in the query that's to be tested.
+    """
+    return "COALESCE(%s) IS NOT NULL" % ", ".join([
+        "%s.msgstr%d" % (tm_sql_identifier, form)
+        for form in xrange(TranslationConstants.MAX_PLURAL_FORMS)])
+
+
 class POFileMixIn(RosettaStats):
     """Base class for `POFile` and `DummyPOFile`.
 
@@ -827,15 +838,15 @@
         """Count `currentcount`, `updatescount`, and `rosettacount`."""
         side_traits = getUtility(ITranslationSideTraitsSet).getForTemplate(
             self.potemplate)
-        has_other_msgstrs = "COALESCE(%s) IS NOT NULL" % ", ".join([
-            "Other.msgstr%d" % form
-            for form in xrange(TranslationConstants.MAX_PLURAL_FORMS)])
         params = {
             'potemplate': quote(self.potemplate),
             'language': quote(self.language),
             'flag': side_traits.flag_name,
             'other_flag': side_traits.other_side_traits.flag_name,
-            'has_other_msgstrs': has_other_msgstrs,
+            'has_msgstrs':
+                compose_sql_translationmessage_has_translations("Current"),
+            'has_other_msgstrs':
+                compose_sql_translationmessage_has_translations("Other"),
         }
         # The "distinct on" combined with the "order by potemplate nulls
         # last" makes diverged messages mask their shared equivalents.
@@ -857,11 +868,11 @@
                     Other.potmsgset = TTI.potmsgset AND
                     Other.language = %(language)s AND
                     Other.%(other_flag)s IS TRUE AND
-                    Other.potemplate IS NULL AND
                     Other.potemplate IS NULL
                 WHERE
                     TTI.potemplate = %(potemplate)s AND
-                    TTI.sequence > 0
+                    TTI.sequence > 0 AND
+                    %(has_msgstrs)s
                 ORDER BY
                     TTI.potmsgset,
                     Current.potemplate NULLS LAST

=== modified file 'lib/lp/translations/tests/test_pofile.py'
--- lib/lp/translations/tests/test_pofile.py	2011-02-17 22:06:02 +0000
+++ lib/lp/translations/tests/test_pofile.py	2011-02-21 13:46:28 +0000
@@ -1,4 +1,4 @@
-# Copyright 2009-2010 Canonical Ltd.  This software is licensed under the
+# Copyright 2009-2011 Canonical Ltd.  This software is licensed under the
 # GNU Affero General Public License version 3 (see the file LICENSE).
 
 __metaclass__ = type
@@ -1824,6 +1824,65 @@
         self.assertEquals(self.pofile.newCount(), 0)
         self.assertEquals(self.pofile.updatesCount(), 1)
 
+    def test_empty_messages_count_as_untranslated(self):
+        # A message with all its msgstr* set to None counts as if
+        # there's no message at all.  It doesn't show up in any of the
+        # counts except as untranslated.
+        self.factory.makeCurrentTranslationMessage(
+            pofile=self.pofile, potmsgset=self.potmsgset, translations=[])
+        self.pofile.updateStatistics()
+        self.assertEqual(0, self.pofile.translatedCount())
+        self.assertEqual(1, self.pofile.untranslatedCount())
+        self.assertEqual(0, self.pofile.newCount())
+        self.assertEqual(0, self.pofile.updatesCount())
+
+    def test_empty_messages_on_this_side_count_as_untranslated(self):
+        # A POTMsgSet whose current TranslationMessage on this side is
+        # empty is counted only as untranslated, regardless of any
+        # translations it may have on the other side.
+        self.factory.makeCurrentTranslationMessage(
+            pofile=self.pofile, potmsgset=self.potmsgset, translations=[])
+        other_message = self.factory.makeSuggestion(
+            pofile=self.pofile, potmsgset=self.potmsgset)
+        other_message.is_current_ubuntu = True
+        self.pofile.updateStatistics()
+        self.assertEqual(0, self.pofile.translatedCount())
+        self.assertEqual(1, self.pofile.untranslatedCount())
+        self.assertEqual(0, self.pofile.newCount())
+        self.assertEqual(0, self.pofile.updatesCount())
+
+    def test_empty_messages_on_other_side_count_as_untranslated(self):
+        # A POTMsgSet that's translated on this side but has an empty
+        # translation on the other side counts as translated on this
+        # side, but not equal between both sides (currentCount) or
+        # translated differently between the two sides (updatesCount).
+        # Instead, it's counted as translated on this side but not on
+        # the other (newCount).
+        self.factory.makeCurrentTranslationMessage(
+            pofile=self.pofile, potmsgset=self.potmsgset)
+        other_message = self.factory.makeSuggestion(
+            pofile=self.pofile, potmsgset=self.potmsgset, translations=[])
+        other_message.is_current_ubuntu = True
+        self.pofile.updateStatistics()
+        self.assertEqual(1, self.pofile.translatedCount())
+        self.assertEqual(0, self.pofile.untranslatedCount())
+        self.assertEqual(1, self.pofile.newCount())
+        self.assertEqual(0, self.pofile.updatesCount())
+        self.assertEqual(0, self.pofile.currentCount())
+
+    def test_tracking_empty_messages_count_as_untranslated(self):
+        # An empty TranslationMessage that's current on both sides
+        # counts as untranslated.
+        self.factory.makeCurrentTranslationMessage(
+            pofile=self.pofile, potmsgset=self.potmsgset, translations=[],
+            current_other=True)
+        self.pofile.updateStatistics()
+        self.assertEqual(0, self.pofile.translatedCount())
+        self.assertEqual(1, self.pofile.untranslatedCount())
+        self.assertEqual(0, self.pofile.newCount())
+        self.assertEqual(0, self.pofile.updatesCount())
+        self.assertEqual(0, self.pofile.currentCount())
+
 
 class TestPOFile(TestCaseWithFactory):
     """Test PO file methods."""

_______________________________________________
Mailing list: https://launchpad.net/~launchpad-reviewers
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~launchpad-reviewers
More help   : https://help.launchpad.net/ListHelp

Reply via email to