[PR] fix(i18n): stop fuzzy-matching new strings so adding one doesn't fail babel-extract [superset]

via GitHub Tue, 09 Jun 2026 09:56:18 -0700


rusackas opened a new pull request, #40909:
URL: https://github.com/apache/superset/pull/40909


   ### SUMMARY
   
   The `babel-extract` translation-regression check has been failing on 
basically every PR that **adds** a translatable string. Root cause + fix below.
   
   **Root cause.** `scripts/translations/babel_update.sh` runs `pybabel update 
--ignore-obsolete` with fuzzy matching left **on** (Babel's default). When a PR 
adds a new English `msgid`, Babel's fuzzy matcher guesses a "close" existing 
translation and stamps it `#, fuzzy` in *every* language catalog. Since #40706 
keyed `check_translation_regression.py` on the **increase in fuzzy entries** 
(correctly, so it can ignore intentional deletions), those spurious guesses 
read as a regression — so adding a single string failed the check across ~16 
catalogs.
   
   That fuzzy-keying change was the right call for deletions; the missing half 
was that the *update* step was still synthesising fuzzies for additions too.
   
   **Fix.** Pass `--no-fuzzy-matching` to `pybabel update`. New (and renamed) 
strings now land as cleanly **untranslated** (empty `msgstr`) instead of as a 
wrong fuzzy guess:
   
   - ✅ Adding a string no longer creates fuzzies → `babel-extract` passes.
   - ✅ Catalog quality improves — the matcher was emitting genuinely wrong 
guesses (a recent example: a new `"valuename"` string got mapped onto an 
unrelated `"table name"` translation).
   - ✅ The check still does its job: it now guards against `#, fuzzy` entries 
that arrive **another way** (e.g. a hand-edited `.po`), which is the realistic 
regression vector.
   - Trade-off: a *rename* no longer auto-strands its old translation as a 
fuzzy for review — the stale translation is simply dropped and the string is 
re-translated by the community. Given the matcher's guesses were unreliable 
anyway, dropping is cleaner than keeping a wrong one.
   
   No catalog churn: running `babel_update.sh` on unchanged `master` source 
produces no new strings, so the flag changes nothing in the committed 
`.po`/`.pot` until a real source change comes through.
   
   ### BEFORE/AFTER
   
   - **Before:** add one `_("…")` string → 13–16 catalogs gain a `#, fuzzy` 
guess → `babel-extract` ✗.
   - **After:** add one `_("…")` string → 13–16 catalogs gain an untranslated 
entry, 0 fuzzies → `babel-extract` ✓.
   
   ### TESTING INSTRUCTIONS
   
   `check_translation_regression.py`'s unit tests 
(`tests/unit_tests/scripts/translations/check_translation_regression_test.py`) 
still pass unchanged (the check logic is untouched; only its docstring is 
updated to describe the new `--no-fuzzy-matching` reality). The behavioural 
change is exercised by the `babel-extract` job on any string-adding PR.
   
   ### ADDITIONAL INFORMATION
   
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration
   
   Context: `babel-extract` is a **non-required** check, so this was red-✗ 
noise rather than a merge blocker — but it was hitting nearly every 
string-adding PR (and prompting wasteful catalog regens). Already-red PRs clear 
once rebased onto this.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] fix(i18n): stop fuzzy-matching new strings so adding one doesn't fail babel-extract [superset]

Reply via email to