fhan688 opened a new pull request, #18899:
URL: https://github.com/apache/hudi/pull/18899
### Describe the issue this Pull Request addresses
Spark SQL CTAS for Hudi tables can write incorrect values for multi-level
partition fields when the partition columns in the SELECT output are not
ordered the same as the table partition spec.
For example, a table created with:
```sql
partitioned by (year, month, day)
```
can receive a CTAS query whose output is:
```sql
select ..., month, day, year
```
The CTAS path currently forwards the resolved query output as-is, so the
downstream write path may interpret partition field values by position instead
of the declared table partition order.
This PR fixes the issue inline.
### Summary and Changelog
This change aligns CTAS query output with the Hudi table partition field
order before creating `CreateHoodieTableAsSelectCommand`.
Changes:
- Reorder CTAS partition attributes according to
`table.partitionColumnNames` in `ResolveImplementationsEarly`.
- Preserve non-partition columns in their original query output order.
- Use Spark's session resolver for partition field matching.
- Avoid adding a projection when the CTAS output is already aligned.
- Add Spark SQL DDL tests for multi-level partition CTAS with both ordered
and out-of-order partition columns.
### Impact
No public API, config, or storage format changes.
This fixes Spark SQL CTAS behavior for Hudi partitioned tables. CTAS now
correctly handles multi-level partition columns even when the SELECT list
orders partition fields differently from the `PARTITIONED BY` clause.
### Risk Level
low
The change is scoped to Hudi Spark SQL CTAS analysis for resolved Hudi
tables. Non-partitioned CTAS and already-aligned CTAS plans keep the existing
behavior. Verification was added for both COW and MOR table types
through the existing `TestCreateTable` CTAS coverage.
### Documentation Update
none
This is a bug fix with no new feature, config, or public API change.
### Contributor's checklist
- [x] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [x] Enough context is provided in the sections above
- [x] Adequate tests were added if applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]