[
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Danny Chen updated HUDI-992:
----------------------------
Fix Version/s: (was: 0.10.0)
> For hive-style partitioned source data, partition columns synced with Hive
> will always have String type
> -------------------------------------------------------------------------------------------------------
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
> Issue Type: Sub-task
> Affects Versions: 0.9.0
> Reporter: Udit Mehrotra
> Assignee: Udit Mehrotra
> Priority: Major
> Fix For: 0.11.0
>
>
> Currently bootstrap implementation is not able to handle partition columns
> correctly when the source data has *hive-style partitioning*, as is also
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit
> metadata does not have partition column schema(in case of hive partitioned
> data). As a result during hive-sync when hudi tries to determine the type of
> partition column from that schema, it would not find it and assume the
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>
> Thus no matter what the data type of partition column is in the source data
> (atleast what spark infers it as from the path), it will always be synced as
> string.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)