rusackas commented on PR #39756:
URL: https://github.com/apache/superset/pull/39756#issuecomment-4594333030

   Pushed a fix for the UUID rendering regression reported above.
   
   **Root cause:** PyArrow ≥ 21 infers Python `uuid.UUID` values as the 
canonical Arrow `uuid` extension type (16-byte fixed binary) instead of raising 
while building the array. Two paths relied on the old "raise → stringify" 
behavior and so now surface UUID columns as garbled bytes / `[bytes]`:
   
   1. `SupersetResultSet` — depended on `pa.array()` raising so values fell 
through to its stringification fallback (this is what the screenshot shows in 
SQL Lab).
   2. `superset/semantic_layers/mapper.py` — round-trips results through 
`Table.to_pandas()`, which converts the `uuid` extension type to raw bytes.
   
   **Fix:** a shared `stringify_extension_columns(table)` helper that converts 
any Arrow extension column to its string form (UUIDs → canonical hex), applied 
at both sites. Plain binary/BLOB columns aren't extension types, so they're 
untouched and still render as `[bytes]`. Added regression tests for the helper 
and an end-to-end UUID result set.
   
   @betodealmeida — could you sanity-check the `semantic_layers/mapper.py` 
change? I wrapped the three `*.results.to_pandas()` calls so extension columns 
are stringified at the source (before `to_pandas()` turns them into raw bytes). 
Wanted your eyes on whether that's the right layer for it, or if you'd prefer 
it handled inside the semantic-layer result construction instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to