rusackas opened a new pull request, #40909:
URL: https://github.com/apache/superset/pull/40909
### SUMMARY
The `babel-extract` translation-regression check has been failing on
basically every PR that **adds** a translatable string. Root cause + fix below.
**Root cause.** `scripts/translations/babel_update.sh` runs `pybabel update
--ignore-obsolete` with fuzzy matching left **on** (Babel's default). When a PR
adds a new English `msgid`, Babel's fuzzy matcher guesses a "close" existing
translation and stamps it `#, fuzzy` in *every* language catalog. Since #40706
keyed `check_translation_regression.py` on the **increase in fuzzy entries**
(correctly, so it can ignore intentional deletions), those spurious guesses
read as a regression — so adding a single string failed the check across ~16
catalogs.
That fuzzy-keying change was the right call for deletions; the missing half
was that the *update* step was still synthesising fuzzies for additions too.
**Fix.** Pass `--no-fuzzy-matching` to `pybabel update`. New (and renamed)
strings now land as cleanly **untranslated** (empty `msgstr`) instead of as a
wrong fuzzy guess:
- ✅ Adding a string no longer creates fuzzies → `babel-extract` passes.
- ✅ Catalog quality improves — the matcher was emitting genuinely wrong
guesses (a recent example: a new `"valuename"` string got mapped onto an
unrelated `"table name"` translation).
- ✅ The check still does its job: it now guards against `#, fuzzy` entries
that arrive **another way** (e.g. a hand-edited `.po`), which is the realistic
regression vector.
- Trade-off: a *rename* no longer auto-strands its old translation as a
fuzzy for review — the stale translation is simply dropped and the string is
re-translated by the community. Given the matcher's guesses were unreliable
anyway, dropping is cleaner than keeping a wrong one.
No catalog churn: running `babel_update.sh` on unchanged `master` source
produces no new strings, so the flag changes nothing in the committed
`.po`/`.pot` until a real source change comes through.
### BEFORE/AFTER
- **Before:** add one `_("…")` string → 13–16 catalogs gain a `#, fuzzy`
guess → `babel-extract` ✗.
- **After:** add one `_("…")` string → 13–16 catalogs gain an untranslated
entry, 0 fuzzies → `babel-extract` ✓.
### TESTING INSTRUCTIONS
`check_translation_regression.py`'s unit tests
(`tests/unit_tests/scripts/translations/check_translation_regression_test.py`)
still pass unchanged (the check logic is untouched; only its docstring is
updated to describe the new `--no-fuzzy-matching` reality). The behavioural
change is exercised by the `babel-extract` job on any string-adding PR.
### ADDITIONAL INFORMATION
- [ ] Has associated issue:
- [ ] Required feature flags:
- [ ] Changes UI
- [ ] Includes DB Migration
Context: `babel-extract` is a **non-required** check, so this was red-✗
noise rather than a merge blocker — but it was hitting nearly every
string-adding PR (and prompting wasteful catalog regens). Already-red PRs clear
once rebased onto this.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]