[jira] [Created] (HUDI-4453) Support partition pruning for tables Bootstrapped from Source Hive Style partitioned tables

Udit Mehrotra (Jira) Fri, 22 Jul 2022 16:33:06 -0700

Udit Mehrotra created HUDI-4453:
-----------------------------------

             Summary: Support partition pruning for tables Bootstrapped from 
Source Hive Style partitioned tables
                 Key: HUDI-4453
                 URL: https://issues.apache.org/jira/browse/HUDI-4453
             Project: Apache Hudi
          Issue Type: Improvement
            Reporter: Udit Mehrotra



As of now the *Bootstrap* feature determines the source schema by reading it 
from the source parquet files => 
[https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/bootstrap/ParquetBootstrapMetadataHandler.java#L61]

This does not consider parquet tables which might be Hive style partitioned. 
Thus, from the source schema partition columns would be missed and not written 
to the target Hudi table either. Also because of this partition pruning does 
not work, as we are unable to prune out source partitions. We should improve 
this logic to determine partition schema correctly from the partition paths in 
case of hive style partitioned tables and write the partition column values 
correctly in the target Hudi table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-4453) Support partition pruning for tables Bootstrapped from Source Hive Style partitioned tables

Reply via email to