malon64 opened a new pull request, #41055: URL: https://github.com/apache/superset/pull/41055
<!--- Please write the PR title following the conventions at https://www.conventionalcommits.org/en/v1.0.0/ Example: fix(dashboard): load charts correctly --> ### SUMMARY For Apache Iceberg tables queried through Trino, `database.get_indexes(table)` returns the columns of the Trino `"<table>$partitions"` system table — which for Iceberg are **metadata fields** (`record_count`, `file_count`, `total_size`, `data`, and a nested `partition` ROW), *not* user partition keys. Superset treated them as partition columns and fed them into `latest_partition()` / `_partition_query()`, generating invalid SQL such as: ```sql SELECT * FROM gold."mart_customer_health_score$partitions" ORDER BY data DESC, file_count DESC, record_count DESC, total_size DESC LIMIT 1 ``` which breaks table preview and latest-partition filtering. Reproducible through `apache/superset:6.1.0`. Fixes #25307 (and dup #26449). **Fix.** `TrinoEngineSpec` now filters these out before building partition metadata. A partition index is treated as Iceberg metadata — and dropped — only when **both**: 1. it contains the signature columns `{record_count, file_count, total_size}` (always present on Iceberg `$partitions`), **and** 2. *every* one of its columns is a known Iceberg metadata field. Condition 2 is what makes it safe: an index with any real partition key (e.g. a Hive table partitioned on `ds` that also happens to have a `record_count` column) fails the check and is passed through completely untouched. The filter is applied in `get_extra_table_metadata()` and in an override of `latest_partition()`, so the table-preview path, the `{{ latest_partition() }}` macro, and automatic `where_latest_partition()` filtering are all covered. **Why filter the fields instead of asking Trino for the table type?** We deliberately chose not to detect the Iceberg connector via `system.metadata.catalogs`: - **Durability** — `connector_name` only exists on recent Trino versions (older releases expose `connector_id`, which is just the catalog name and doesn't identify the connector), so a type lookup would be version-fragile and silently wrong on older clusters. - **No extra round-trip / permissions** — the heuristic is purely local to the reflection data Superset already has. A type query adds a round-trip on a UI metadata path and assumes the principal can read the `system` catalog, which some locked-down deployments restrict. - **Same outcome** — even after confirming a table is Iceberg, the correct action is identical to what we do here: skip `$partitions`-based latest-partition logic. So a connector lookup would add cost and fragility without changing behavior. - **Connector-agnostic & non-invasive** — the change only ever *removes* an all-metadata index; it cannot affect Hive/Presto or any non-Iceberg table, and it stays consistent with the engine spec's existing reflection-based design. Trade-off: a genuinely partitioned Iceberg table now shows *no* latest-partition metadata rather than wrong metadata (matching the issue's suggested direction). Reading partition values out of the nested `partition` ROW is a larger, separate enhancement. ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF N/A (backend behavior; no UI change). ### TESTING INSTRUCTIONS Unit tests added in `tests/unit_tests/db_engine_specs/test_trino.py` cover: Iceberg unpartitioned, Iceberg partitioned, a real key alongside coincidental signature-named columns (preserved), and Hive (unchanged). ```bash pytest tests/unit_tests/db_engine_specs/test_trino.py ``` Manual: connect a Trino catalog backed by the Iceberg connector, open an Iceberg table in SQL Lab / dataset preview, and confirm it loads without an `ORDER BY data DESC, file_count DESC, …` error. Confirm a Hive partitioned table still shows its latest partition. ### ADDITIONAL INFORMATION - [x] Has associated issue: #25307 - [ ] Required feature flags: - [ ] Changes UI - [ ] Includes DB Migration - [ ] Introduces new feature or API - [ ] Removes existing feature or API -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
