rusackas opened a new pull request, #41533:
URL: https://github.com/apache/superset/pull/41533

   ### SUMMARY
   
   Non-ASCII text inside **array / struct / JSON column values** (e.g. the CJK 
and Cyrillic strings produced by `array_agg`) was displayed as `\uXXXX` escape 
sequences in SQL Lab, Explore, and on dashboards — the "unicode gibberish" 
reported in #19388 and #22904. Plain string columns were never affected; only 
nested values that get JSON-serialized for the result grid.
   
   The fix is deliberately narrow:
   
   - `superset.utils.json.dumps` gains an opt-in `ensure_ascii: bool = True` 
parameter. The default is unchanged, so metadata serialization keeps escaping 
non-ASCII for narrow charset columns (notably MySQL `utf8`/utf8mb3). In 
particular the `position_json` emoji-escaping from #39737 stays intact.
   - Only the result-set `stringify` path (`superset/result_set.py`) opts into 
`ensure_ascii=False`. That is the single, DRY chokepoint through which 
array/struct values flow to SQL Lab, Explore and dashboards. It affects the 
query result payload only — never anything persisted to the metadata database.
   
   This avoids the breaking change / MySQL charset migration that a global 
`ensure_ascii=False` would have required, and it leaves no regression against 
the #39737 emoji-truncation guard.
   
   This **supersedes #33720** (same goal, narrower implementation), whose 
author appears to be MIA. Credit to @Quatters, retained as co-author on the 
commit.
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   
   Before (array_agg of Cyrillic text in SQL Lab):
   
   ```
   ["Лонгсливы", "Свитшоты"]
   ```
   
   After:
   
   ```
   ["Лонгсливы", "Свитшоты"]
   ```
   
   ### TESTING INSTRUCTIONS
   
   Automated (unit):
   
   ```bash
   pytest tests/unit_tests/result_set_test.py 
tests/unit_tests/utils/json_tests.py
   ```
   
   `test_stringify_values_preserves_non_ascii_characters` reproduces both 
linked issues and fails without the fix. The #39737 emoji tests 
(`tests/integration_tests/dashboards/test_update_emoji.py`) remain green since 
the metadata path is untouched.
   
   Manual:
   1. Point Superset at a Postgres analytics DB containing non-ASCII text.
   2. In SQL Lab, run a query that wraps the text in `array_agg(...)` (or 
select an array/JSON column).
   3. Confirm the result grid shows the characters verbatim, not `\uXXXX`.
   4. Repeat via Explore / a dashboard table chart.
   
   ### ADDITIONAL INFORMATION
   
   - [x] Has associated issue: Closes #19388, Closes #22904
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in 
[SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to