[
https://issues.apache.org/jira/browse/HUDI-353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
lamber-ken resolved HUDI-353.
-----------------------------
Resolution: Resolved
Fixed at master e555aa516de867a4faf0322e79defa1f52d887ef
> Add support for Hive style partitioning path
> --------------------------------------------
>
> Key: HUDI-353
> URL: https://issues.apache.org/jira/browse/HUDI-353
> Project: Apache Hudi (incubating)
> Issue Type: Improvement
> Components: Hive Integration
> Reporter: Wenning Ding
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> In Hive, the partition folder name follows this format:
> <partition_column_name>=<partition_value>.
> But in Hudi, the name of its partition folder is <partition_value>.
> e.g. A dataset is partitioned by three columns: year, month and day.
> In Hive, the data is saved in:
> {{.../<table_name>/year=2019/month=05/day=01/xxx.parquet}}
> In Hudi, the data is saved in: {{.../<table_name>/2019/05/01/xxx.parquet}}
> Basically I add a new option in Spark datasource named
> {{HIVE_STYLE_PARTITIONING_FILED_OPT_KEY}} which indicates whether using hive
> style partitioning or not. By default this option is false (not use).
> Also, if using hive style partitioning, instead of scanning the dataset and
> manually adding/updating all partitions, we can use "MSCK REPAIR TABLE
> <table_name>" to automatically sync all the partition info with Hive
> MetaStore.
> h3.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)