Re: [PR] fix(i18n): don't flag intentional string deletions as translation regressions [superset]

via GitHub Wed, 03 Jun 2026 10:04:49 -0700


rusackas commented on code in PR #40716:
URL: https://github.com/apache/superset/pull/40716#discussion_r3350446646



##########
scripts/translations/check_translation_regression.py:
##########
@@ -169,26 +220,32 @@ def cmd_compare(
     report_path: Optional[str] = None,
 ) -> None:
     with open(before_path) as f:
-        before: dict[str, int] = json.load(f)
+        before_raw: dict[str, object] = json.load(f)
+    before = {lang: _normalize(entry) for lang, entry in before_raw.items()}
 
     after = get_counts(translations_dir)
 
+    # A regression is an *increase* in fuzzy entries: the PR's source diff
+    # renamed/reworded strings, leaving their committed translations stranded.
+    # A plain drop in the translated count is NOT used — deleting a string
+    # lowers it identically to a rename but is a legitimate change, and with
+    # `pybabel update --ignore-obsolete` a deletion creates no fuzzy entry.
     regressions: list[tuple[str, int, int]] = []
-    for lang, before_count in sorted(before.items()):
-        after_count = after.get(lang, 0)
-        if after_count < before_count:
-            regressions.append((lang, before_count, after_count))
+    for lang, before_stats in sorted(before.items()):
+        after_stats = after.get(lang, {"translated": 0, "fuzzy": 0})
+        if after_stats["fuzzy"] > before_stats["fuzzy"]:
+            regressions.append((lang, before_stats["fuzzy"], 
after_stats["fuzzy"]))

Review Comment:
   Good catch on the underlying scenario, though the fix needed to be narrower 
than "treat any missing baseline language as a regression." Intentionally 
deleting an entire catalog is a legitimate change (the same reason a drop in 
translated count is not flagged), so failing on every absence would defeat the 
purpose of this PR.
   
   The real gap was the *uncountable* case: a `.po` that is still present but 
fails `msgfmt` (malformed/corrupt) was silently skipped in `get_counts`, making 
it indistinguishable from a deletion and so passing as "no regression." Fixed 
in f713d309d7: `get_counts` now records count failures, and `cmd_compare` 
treats a baseline language whose catalog is present-but-uncountable as a hard 
failure, while a genuinely deleted catalog still passes. Added unit tests for 
all three cases (deleted catalog passes, uncountable baseline catalog fails, 
uncountable non-baseline catalog ignored).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(i18n): don't flag intentional string deletions as translation regressions [superset]

Reply via email to