Re: [PR] fix(i18n): don't flag intentional string deletions as translation regressions [superset]

via GitHub Wed, 03 Jun 2026 08:46:57 -0700


codeant-ai-for-open-source[bot] commented on code in PR #40716:
URL: https://github.com/apache/superset/pull/40716#discussion_r3349929141



##########
scripts/translations/check_translation_regression.py:
##########
@@ -169,26 +220,32 @@ def cmd_compare(
     report_path: Optional[str] = None,
 ) -> None:
     with open(before_path) as f:
-        before: dict[str, int] = json.load(f)
+        before_raw: dict[str, object] = json.load(f)
+    before = {lang: _normalize(entry) for lang, entry in before_raw.items()}
 
     after = get_counts(translations_dir)
 
+    # A regression is an *increase* in fuzzy entries: the PR's source diff
+    # renamed/reworded strings, leaving their committed translations stranded.
+    # A plain drop in the translated count is NOT used — deleting a string
+    # lowers it identically to a rename but is a legitimate change, and with
+    # `pybabel update --ignore-obsolete` a deletion creates no fuzzy entry.
     regressions: list[tuple[str, int, int]] = []
-    for lang, before_count in sorted(before.items()):
-        after_count = after.get(lang, 0)
-        if after_count < before_count:
-            regressions.append((lang, before_count, after_count))
+    for lang, before_stats in sorted(before.items()):
+        after_stats = after.get(lang, {"translated": 0, "fuzzy": 0})
+        if after_stats["fuzzy"] > before_stats["fuzzy"]:
+            regressions.append((lang, before_stats["fuzzy"], 
after_stats["fuzzy"]))

Review Comment:
   **Suggestion:** Missing-language handling is broken: when a locale exists in 
the baseline but is absent after the PR run, it is treated as `{"translated": 
0, "fuzzy": 0}` and therefore never flagged as a regression. This allows 
accidental deletion of an entire language catalog (or a failed count that drops 
a language) to pass as "No translation regressions." Treat baseline languages 
missing from `after` as a hard failure (or at least a regression) instead of 
defaulting them to zero stats. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ CI translations check misses deleted language catalogs.
   - ❌ Entire locale loss ships without regression warning.
   - ⚠️ Users in affected locale see untranslated or missing text.
   - ⚠️ Translation failures due to msgfmt errors remain undetected.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Observe the CI workflow in `.github/workflows/superset-translations.yml` 
lines 9–15 and
   24–31, where the `Translations` job runs `python
   scripts/translations/check_translation_regression.py --count 
--translations-dir
   /tmp/base-worktree/superset/translations > /tmp/before.json` on the base 
worktree and
   later runs `python scripts/translations/check_translation_regression.py 
--compare
   /tmp/before.json --report /tmp/regression-report.md` on the PR worktree.
   
   2. In the base worktree, ensure a language like `fr` exists with a valid 
catalog at
   `superset/translations/fr/LC_MESSAGES/messages.po`; running the `--count` 
command records
   a baseline JSON `/tmp/before.json` containing an entry such as `{"fr": 
{"translated": 100,
   "fuzzy": 5}}` (verified by the implementation of `cmd_count` and 
`get_counts` in
   `scripts/translations/check_translation_regression.py` lines 91–147 and 
212–215).
   
   3. In the PR worktree, accidentally delete or break the same catalog (for 
example, remove
   `superset/translations/fr/LC_MESSAGES/messages.po` or corrupt it so `msgfmt` 
fails);
   during the `--compare` run, `get_counts()` (lines 129–147) will either omit 
`fr`
   entirely—because `glob("*/LC_MESSAGES/messages.po")` no longer finds it or 
because a
   `CalledProcessError`/`RuntimeError` causes the language to be skipped with a 
warning—and
   return an `after` dict that lacks the `fr` key.
   
   4. In `cmd_compare` at 
`scripts/translations/check_translation_regression.py` lines
   217–237, the loop `for lang, before_stats in sorted(before.items()):` with 
`after_stats =
   after.get(lang, {"translated": 0, "fuzzy": 0})` treats the missing `fr` as 
`{"translated":
   0, "fuzzy": 0}`; since `after_stats["fuzzy"]` (0) is not greater than
   `before_stats["fuzzy"]` (e.g. 5), no regression is appended, the script 
prints "No
   translation regressions.\n" in the summary branch at lines 256–259, and CI 
passes even
   though an entire language catalog was lost or could not be counted.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=282b997ab3ac456988654f46da6c01f9&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=282b997ab3ac456988654f46da6c01f9&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** scripts/translations/check_translation_regression.py
   **Line:** 234:237
   **Comment:**
        *Logic Error: Missing-language handling is broken: when a locale exists 
in the baseline but is absent after the PR run, it is treated as 
`{"translated": 0, "fuzzy": 0}` and therefore never flagged as a regression. 
This allows accidental deletion of an entire language catalog (or a failed 
count that drops a language) to pass as "No translation regressions." Treat 
baseline languages missing from `after` as a hard failure (or at least a 
regression) instead of defaulting them to zero stats.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40716&comment_hash=ec31fbd6dfcaf7b821ee06a07636e17e66fd45792e733e96e622eab646269b00&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40716&comment_hash=ec31fbd6dfcaf7b821ee06a07636e17e66fd45792e733e96e622eab646269b00&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(i18n): don't flag intentional string deletions as translation regressions [superset]

Reply via email to