Udit Mehrotra created HUDI-4453:
-----------------------------------
Summary: Support partition pruning for tables Bootstrapped from
Source Hive Style partitioned tables
Key: HUDI-4453
URL: https://issues.apache.org/jira/browse/HUDI-4453
Project: Apache Hudi
Issue Type: Improvement
Reporter: Udit Mehrotra
As of now the *Bootstrap* feature determines the source schema by reading it
from the source parquet files =>
[https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/bootstrap/ParquetBootstrapMetadataHandler.java#L61]
This does not consider parquet tables which might be Hive style partitioned.
Thus, from the source schema partition columns would be missed and not written
to the target Hudi table either. Also because of this partition pruning does
not work, as we are unable to prune out source partitions. We should improve
this logic to determine partition schema correctly from the partition paths in
case of hive style partitioned tables and write the partition column values
correctly in the target Hudi table.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)