[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

Danny Chen (Jira) Fri, 26 Nov 2021 20:46:43 -0800


     [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Danny Chen updated HUDI-992:
----------------------------
    Fix Version/s:     (was: 0.10.0)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-992
>                 URL: https://issues.apache.org/jira/browse/HUDI-992
>             Project: Apache Hudi
>          Issue Type: Sub-task
>    Affects Versions: 0.9.0
>            Reporter: Udit Mehrotra
>            Assignee: Udit Mehrotra
>            Priority: Major
>             Fix For: 0.11.0
>
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

Reply via email to