huan233usc opened a new pull request, #2575:
URL: https://github.com/apache/iceberg-rust/pull/2575
## Which issue does this PR close?
- Closes #2050
## What changes are included in this PR?
`CREATE EXTERNAL TABLE ... STORED AS ICEBERG` (via
`IcebergTableProviderFactory`) previously rejected any `PARTITIONED BY` clause
outright.
DataFusion's `PARTITIONED BY` grammar only accepts plain column names — it
cannot express Iceberg transforms such as `bucket(16, id)` or `days(ts)`
(unlike Spark's native DSv2 grammar). Given that constraint, this PR:
- Stops rejecting `table_partition_cols` in `check_cmd`.
- Adds `validate_partition_columns`, run after the table is loaded:
- If the table's default partition spec uses any **non-identity**
transform, returns a clear `FeatureUnsupported` error naming the offending
field/transform.
- Otherwise validates that the declared columns exactly match the identity
partition columns **in order** (consistent with
`PartitionSpec::is_compatible_with` and Java's `PartitionSpec.compatibleWith`,
where field order is significant).
- Omitting `PARTITIONED BY` keeps the previous behavior: any table —
including non-identity partitioned ones — can still be registered for read-only
access.
- A `TODO` is left to support non-identity transforms once DataFusion's
grammar can express them.
### Example
```sql
CREATE EXTERNAL TABLE my_iceberg_table
STORED AS ICEBERG LOCATION '/path/to/metadata.json'
PARTITIONED BY (event_date);
```
## Are these changes tested?
Yes. Added unit tests in `table_provider_factory.rs` plus two metadata
fixtures (bucket-partitioned and multi-identity-partitioned):
- single identity column match / mismatch
- multiple identity columns match / wrong order / subset (count mismatch)
- non-identity (`bucket[4]`) transform rejected with a clear error
- non-identity partitioned table still registers when `PARTITIONED BY` is
omitted
`cargo test -p iceberg-datafusion` and `cargo clippy -p iceberg-datafusion
--all-targets` pass.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]