[jira] [Created] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

Udit Mehrotra (Jira) Wed, 03 Jun 2020 17:55:26 -0700

Udit Mehrotra created HUDI-992:
----------------------------------

             Summary: For hive-style partitioned source data, partition columns 
synced with Hive will always have String type
                 Key: HUDI-992
                 URL: https://issues.apache.org/jira/browse/HUDI-992
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Udit Mehrotra



Currently bootstrap implementation is not able to handle partition columns 
correctly when the source data has *hive-style partitioning*, as is also 
mentioned in https://jira.apache.org/jira/browse/HUDI-915

The schema inferred while performing bootstrap and stored in the commit 
metadata does not have partition column schema(in case of hive partitioned 
data). As a result during hive-sync when hudi tries to determine the type of 
partition column from that schema, it would not find it and assume the default 
data type *string*.

Here is where partition column schema is determined for hive-sync:

[https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]

 

Thus no matter what the data type of partition column is in the source data 
(atleast what spark infers it as from the path), it will always be synced as 
string.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

Reply via email to