Bungic opened a new pull request, #40801:
URL: https://github.com/apache/superset/pull/40801

   ### SUMMARY
   
   `CSV_EXPORT["encoding"]` currently has no effect on CSV downloads. 
`CsvResponse` sets it through the `Response.charset` class attribute, but 
Werkzeug deprecated `charset` in 2.3 and removed it in 3.0 
([changelog](https://werkzeug.palletsprojects.com/en/stable/changes/#version-3-0-0)).
 Since Superset pins `werkzeug==3.1.6`, the attribute is dead code and str 
bodies are always encoded as plain utf-8, whatever the config says.
   
   The painful part: the default config is `CSV_EXPORT = {"encoding": 
"utf-8-sig"}`, which promises a BOM that never arrives. Without it, Excel 
decodes exports with the local Windows codepage and garbles every non-ASCII 
character. #36374 (Chinese) and #29410 (Arabic) are this bug; we hit it with 
Turkish data in emailed report CSVs on a production 4.1.4 install, where it 
took a while to track down because the multi-query ZIP export path encodes 
manually (`query_data.encode(encoding)` in `charts/data/api.py`) and was never 
broken. Only single-query exports are affected.
   
   This PR encodes str bodies in `CsvResponse.__init__`, reading the encoding 
at request time. Bytes bodies pass through unchanged, so the ZIP path keeps 
working as before. #29506 tried to solve this by pinning Werkzeug back to 2.x 
and was closed; fixing the response class keeps Werkzeug 3.
   
   First bytes of `/api/v1/chart/<id>/data/?format=csv` with `utf-8-sig` 
configured, before:
   
   ```
   b',Sipari\xc5\x9f Ta'             -> Excel renders "SipariÅŸ"
   ```
   
   after:
   
   ```
   b'\xef\xbb\xbf,Sipari\xc5\x9f'    -> Excel renders "Sipariş"
   ```
   
   Fixes #36374
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   
   N/A, byte-level change (hex dumps above).
   
   ### TESTING INSTRUCTIONS
   
   1. Keep `CSV_EXPORT` at its default.
   2. Export any chart with non-ASCII data as CSV, or download SQL Lab results 
as CSV.
   3. The file now starts with `EF BB BF` (`xxd file.csv | head -1`) and Excel 
opens it with correct characters.
   4. `pytest tests/unit_tests/views/test_base.py -k csv_response`
   
   ### ADDITIONAL INFORMATION
   
   - [x] Has associated issue: Fixes #36374
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in 
[SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to