[ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-992:
------------------------------------
    Status: Open  (was: New)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-992
>                 URL: https://issues.apache.org/jira/browse/HUDI-992
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Udit Mehrotra
>            Priority: Major
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to