rusackas commented on PR #39756: URL: https://github.com/apache/superset/pull/39756#issuecomment-4594333030
Pushed a fix for the UUID rendering regression reported above. **Root cause:** PyArrow ≥ 21 infers Python `uuid.UUID` values as the canonical Arrow `uuid` extension type (16-byte fixed binary) instead of raising while building the array. Two paths relied on the old "raise → stringify" behavior and so now surface UUID columns as garbled bytes / `[bytes]`: 1. `SupersetResultSet` — depended on `pa.array()` raising so values fell through to its stringification fallback (this is what the screenshot shows in SQL Lab). 2. `superset/semantic_layers/mapper.py` — round-trips results through `Table.to_pandas()`, which converts the `uuid` extension type to raw bytes. **Fix:** a shared `stringify_extension_columns(table)` helper that converts any Arrow extension column to its string form (UUIDs → canonical hex), applied at both sites. Plain binary/BLOB columns aren't extension types, so they're untouched and still render as `[bytes]`. Added regression tests for the helper and an end-to-end UUID result set. @betodealmeida — could you sanity-check the `semantic_layers/mapper.py` change? I wrapped the three `*.results.to_pandas()` calls so extension columns are stringified at the source (before `to_pandas()` turns them into raw bytes). Wanted your eyes on whether that's the right layer for it, or if you'd prefer it handled inside the semantic-layer result construction instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
