codeant-ai-for-open-source[bot] commented on code in PR #40716:
URL: https://github.com/apache/superset/pull/40716#discussion_r3349929141
##########
scripts/translations/check_translation_regression.py:
##########
@@ -169,26 +220,32 @@ def cmd_compare(
report_path: Optional[str] = None,
) -> None:
with open(before_path) as f:
- before: dict[str, int] = json.load(f)
+ before_raw: dict[str, object] = json.load(f)
+ before = {lang: _normalize(entry) for lang, entry in before_raw.items()}
after = get_counts(translations_dir)
+ # A regression is an *increase* in fuzzy entries: the PR's source diff
+ # renamed/reworded strings, leaving their committed translations stranded.
+ # A plain drop in the translated count is NOT used — deleting a string
+ # lowers it identically to a rename but is a legitimate change, and with
+ # `pybabel update --ignore-obsolete` a deletion creates no fuzzy entry.
regressions: list[tuple[str, int, int]] = []
- for lang, before_count in sorted(before.items()):
- after_count = after.get(lang, 0)
- if after_count < before_count:
- regressions.append((lang, before_count, after_count))
+ for lang, before_stats in sorted(before.items()):
+ after_stats = after.get(lang, {"translated": 0, "fuzzy": 0})
+ if after_stats["fuzzy"] > before_stats["fuzzy"]:
+ regressions.append((lang, before_stats["fuzzy"],
after_stats["fuzzy"]))
Review Comment:
**Suggestion:** Missing-language handling is broken: when a locale exists in
the baseline but is absent after the PR run, it is treated as `{"translated":
0, "fuzzy": 0}` and therefore never flagged as a regression. This allows
accidental deletion of an entire language catalog (or a failed count that drops
a language) to pass as "No translation regressions." Treat baseline languages
missing from `after` as a hard failure (or at least a regression) instead of
defaulting them to zero stats. [logic error]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ CI translations check misses deleted language catalogs.
- ❌ Entire locale loss ships without regression warning.
- ⚠️ Users in affected locale see untranslated or missing text.
- ⚠️ Translation failures due to msgfmt errors remain undetected.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Observe the CI workflow in `.github/workflows/superset-translations.yml`
lines 9–15 and
24–31, where the `Translations` job runs `python
scripts/translations/check_translation_regression.py --count
--translations-dir
/tmp/base-worktree/superset/translations > /tmp/before.json` on the base
worktree and
later runs `python scripts/translations/check_translation_regression.py
--compare
/tmp/before.json --report /tmp/regression-report.md` on the PR worktree.
2. In the base worktree, ensure a language like `fr` exists with a valid
catalog at
`superset/translations/fr/LC_MESSAGES/messages.po`; running the `--count`
command records
a baseline JSON `/tmp/before.json` containing an entry such as `{"fr":
{"translated": 100,
"fuzzy": 5}}` (verified by the implementation of `cmd_count` and
`get_counts` in
`scripts/translations/check_translation_regression.py` lines 91–147 and
212–215).
3. In the PR worktree, accidentally delete or break the same catalog (for
example, remove
`superset/translations/fr/LC_MESSAGES/messages.po` or corrupt it so `msgfmt`
fails);
during the `--compare` run, `get_counts()` (lines 129–147) will either omit
`fr`
entirely—because `glob("*/LC_MESSAGES/messages.po")` no longer finds it or
because a
`CalledProcessError`/`RuntimeError` causes the language to be skipped with a
warning—and
return an `after` dict that lacks the `fr` key.
4. In `cmd_compare` at
`scripts/translations/check_translation_regression.py` lines
217–237, the loop `for lang, before_stats in sorted(before.items()):` with
`after_stats =
after.get(lang, {"translated": 0, "fuzzy": 0})` treats the missing `fr` as
`{"translated":
0, "fuzzy": 0}`; since `after_stats["fuzzy"]` (0) is not greater than
`before_stats["fuzzy"]` (e.g. 5), no regression is appended, the script
prints "No
translation regressions.\n" in the summary branch at lines 256–259, and CI
passes even
though an entire language catalog was lost or could not be counted.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=282b997ab3ac456988654f46da6c01f9&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=282b997ab3ac456988654f46da6c01f9&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** scripts/translations/check_translation_regression.py
**Line:** 234:237
**Comment:**
*Logic Error: Missing-language handling is broken: when a locale exists
in the baseline but is absent after the PR run, it is treated as
`{"translated": 0, "fuzzy": 0}` and therefore never flagged as a regression.
This allows accidental deletion of an entire language catalog (or a failed
count that drops a language) to pass as "No translation regressions." Treat
baseline languages missing from `after` as a hard failure (or at least a
regression) instead of defaulting them to zero stats.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40716&comment_hash=ec31fbd6dfcaf7b821ee06a07636e17e66fd45792e733e96e622eab646269b00&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40716&comment_hash=ec31fbd6dfcaf7b821ee06a07636e17e66fd45792e733e96e622eab646269b00&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]