[PR] fix(trino): ignore Iceberg $partitions metadata fields in partition detection [superset]

via GitHub Mon, 15 Jun 2026 08:03:41 -0700


malon64 opened a new pull request, #41055:
URL: https://github.com/apache/superset/pull/41055


   <!---
   Please write the PR title following the conventions at 
https://www.conventionalcommits.org/en/v1.0.0/
   Example:
   fix(dashboard): load charts correctly
   -->
   
   ### SUMMARY
   
   For Apache Iceberg tables queried through Trino, 
`database.get_indexes(table)` returns the columns of the Trino 
`"<table>$partitions"` system table — which for Iceberg are **metadata fields** 
(`record_count`, `file_count`, `total_size`, `data`, and a nested `partition` 
ROW), *not* user partition keys. Superset treated them as partition columns and 
fed them into `latest_partition()` / `_partition_query()`, generating invalid 
SQL such as:
   
   ```sql
   SELECT * FROM gold."mart_customer_health_score$partitions"
   ORDER BY data DESC, file_count DESC, record_count DESC, total_size DESC
   LIMIT 1
   ```
   
   which breaks table preview and latest-partition filtering. Reproducible 
through `apache/superset:6.1.0`. Fixes #25307 (and dup #26449).
   
   **Fix.** `TrinoEngineSpec` now filters these out before building partition 
metadata. A partition index is treated as Iceberg metadata — and dropped — only 
when **both**:
   1. it contains the signature columns `{record_count, file_count, 
total_size}` (always present on Iceberg `$partitions`), **and**
   2. *every* one of its columns is a known Iceberg metadata field.
   
   Condition 2 is what makes it safe: an index with any real partition key 
(e.g. a Hive table partitioned on `ds` that also happens to have a 
`record_count` column) fails the check and is passed through completely 
untouched. The filter is applied in `get_extra_table_metadata()` and in an 
override of `latest_partition()`, so the table-preview path, the `{{ 
latest_partition() }}` macro, and automatic `where_latest_partition()` 
filtering are all covered.
   
   **Why filter the fields instead of asking Trino for the table type?** We 
deliberately chose not to detect the Iceberg connector via 
`system.metadata.catalogs`:
   - **Durability** — `connector_name` only exists on recent Trino versions 
(older releases expose `connector_id`, which is just the catalog name and 
doesn't identify the connector), so a type lookup would be version-fragile and 
silently wrong on older clusters.
   - **No extra round-trip / permissions** — the heuristic is purely local to 
the reflection data Superset already has. A type query adds a round-trip on a 
UI metadata path and assumes the principal can read the `system` catalog, which 
some locked-down deployments restrict.
   - **Same outcome** — even after confirming a table is Iceberg, the correct 
action is identical to what we do here: skip `$partitions`-based 
latest-partition logic. So a connector lookup would add cost and fragility 
without changing behavior.
   - **Connector-agnostic & non-invasive** — the change only ever *removes* an 
all-metadata index; it cannot affect Hive/Presto or any non-Iceberg table, and 
it stays consistent with the engine spec's existing reflection-based design.
   
   Trade-off: a genuinely partitioned Iceberg table now shows *no* 
latest-partition metadata rather than wrong metadata (matching the issue's 
suggested direction). Reading partition values out of the nested `partition` 
ROW is a larger, separate enhancement.
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   
   N/A (backend behavior; no UI change).
   
   ### TESTING INSTRUCTIONS
   
   Unit tests added in `tests/unit_tests/db_engine_specs/test_trino.py` cover: 
Iceberg unpartitioned, Iceberg partitioned, a real key alongside coincidental 
signature-named columns (preserved), and Hive (unchanged).
   
   ```bash
   pytest tests/unit_tests/db_engine_specs/test_trino.py
   ```
   
   Manual: connect a Trino catalog backed by the Iceberg connector, open an 
Iceberg table in SQL Lab / dataset preview, and confirm it loads without an 
`ORDER BY data DESC, file_count DESC, …` error. Confirm a Hive partitioned 
table still shows its latest partition.
   
   ### ADDITIONAL INFORMATION
   
   - [x] Has associated issue: #25307
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] fix(trino): ignore Iceberg $partitions metadata fields in partition detection [superset]

Reply via email to