rusackas commented on PR #39756: URL: https://github.com/apache/superset/pull/39756#issuecomment-4654976267
Pushed this forward — rebased on latest `master` and verified it's landable. Summary of where it stands: **Changelog review (pyarrow 21 → 24):** The only change that actually touches how Superset uses pyarrow is UUID type inference. Since **pyarrow 21**, `pa.array([uuid.UUID(...)])` infers the canonical `arrow.uuid` extension type (16-byte fixed binary) instead of raising. Previously `SupersetResultSet` relied on that raise to route UUID values through its stringification fallback, so without handling them they'd surface in the results grid as garbled bytes. The other notable items (gandiva deprecation, removal of long-deprecated v13/v18 APIs) don't affect any code path we use. No `Table.to_pandas` signature changes (`integer_object_nulls`, `timestamp_as_object` still work), and `pyarrow.parquet` / `pyarrow.feather` / `pyarrow.lib.ArrowException` all still import. **Breakage found + fixed:** The UUID extension-type regression, in two places: - `SupersetResultSet` (SQL Lab, Explore/chart data, column introspection) - `superset/semantic_layers/mapper.py`, which round-trips through `Table.to_pandas()` Added a shared `stringify_extension_columns(table)` helper that converts any Arrow extension column to its string form (UUIDs → canonical hex) and applied it at both sites. Plain binary/BLOB columns aren't extension types, so they're untouched. Regression tests cover the helper plus an end-to-end UUID result set. **Dependency floor:** No hard conflict. pyarrow 24 installs cleanly against our pinned `pandas==2.1.4` / `numpy 1.26.4`, and `pyproject.toml` already declares `pyarrow>=24.0.0,<25` alongside `pandas>=2.1.4,<2.4`. Also added `apache-2.0` to the liccheck authorized-licenses list (a transitive dep reports that exact license string). **Tests (run locally with pyarrow 24.0.0 + pandas 2.1.4):** - `result_set` / `dataframe` / `arrow` / `semantic` unit tests: 471 passed, 1 skipped - `columnar` / `uploader` / `hive` (parquet paths): 80 passed, 1 skipped - `pre-commit` (mypy, ruff, pylint) clean on all changed files Landable as-is. Not merging — leaving that to a committer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
