Udit Mehrotra created HUDI-607:
----------------------------------

             Summary: Hive sync fails to register tables partitioned by Date 
Type column
                 Key: HUDI-607
                 URL: https://issues.apache.org/jira/browse/HUDI-607
             Project: Apache Hudi (incubating)
          Issue Type: Bug
          Components: Hive Integration
            Reporter: Udit Mehrotra


h2. Issue Description

As part of spark to avro conversion, Spark's *Date* type is represented as 
corresponding *Date Logical Type* in Avro, which is underneath represented in 
Avro by physical *Integer* type. For this reason when forming the Avro records 
from Spark rows, it is converted to corresponding *Epoch day* to be stored as 
corresponding *Integer* value in the parquet files.

However, this manifests into a problem that when a *Date Type* column is chosen 
as partition column. In this case, Hudi's partition column 
*_hoodie_partition_path* also gets the corresponding *epoch day integer* value 
when reading the partition field from the avro record, and as a result syncing 
partitions in hudi table issues a command like the following, where the date is 
an integer:
{noformat}
ALTER TABLE uditme_hudi.uditme_hudi_events_cow_feb05_00 ADD IF NOT EXISTS   
PARTITION (event_date='17897') LOCATION 
's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17897'
   PARTITION (event_date='17898') LOCATION 
's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17898'
   PARTITION (event_date='17899') LOCATION 
's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17899'
   PARTITION (event_date='17900') LOCATION 
's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17900'{noformat}
Hive is not able to make sense of the partition field values like *17897* as it 
is not able to convert it to corresponding date from this string. It actually 
expects the actual date to be represented in string form.

So, we need to make sure that Hudi's partition field gets the actual date value 
in string form, instead of the integer. This change makes sure that when a 
fields value is retrieved from the Avro record, we check that if its *Date 
Logical Type* we return the actual date value, instead of the epoch. After this 
change the command for sync partitions issues is like:
{noformat}
ALTER TABLE `uditme_hudi`.`uditme_hudi_events_cow_feb05_01` ADD IF NOT EXISTS   
PARTITION (`event_date`='2019-01-01') LOCATION 
's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-01'
   PARTITION (`event_date`='2019-01-02') LOCATION 
's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-02'
   PARTITION (`event_date`='2019-01-03') LOCATION 
's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-03'
   PARTITION (`event_date`='2019-01-04') LOCATION 
's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-04'{noformat}
h2. Stack Trace
{noformat}
20/01/13 23:28:04 INFO HoodieHiveClient: Last commit time synced is not known, 
listing all partitions in 
s3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar,FS
 :com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem@1f0c8e1f
20/01/13 23:28:08 INFO HiveSyncTool: Storage partitions scan complete. Found 31
20/01/13 23:28:08 INFO HiveSyncTool: New Partitions [18206, 18207, 18208, 
18209, 18210, 18211, 18212, 18213, 18214, 18215, 18216, 18217, 18218, 18219, 
18220, 18221, 18222, 18223, 18224, 18225, 18226, 18227, 18228, 18229, 18230, 
18231, 18232, 18233, 18234, 18235, 18236]
20/01/13 23:28:08 INFO HoodieHiveClient: Adding partitions 31 to table 
fact_hourly_search_term_conversions_hudi_mor_hudi_jar
20/01/13 23:28:08 INFO HoodieHiveClient: Executing SQL ALTER TABLE 
default.fact_hourly_search_term_conversions_hudi_mor_hudi_jar ADD IF NOT EXISTS 
  PARTITION (dim_date='18206') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18206'
   PARTITION (dim_date='18207') LOCATION $
s3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18207'
   PARTITION (dim_date='18208') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18208'
   PARTITION (dim_date='18209') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_$
n_read_aws_hudi_jar/18209'   PARTITION (dim_date='18210') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18210'
   PARTITION (dim_date='18211') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18211'
   PARTITION (dim_date='18212') L$
CATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18212'
   PARTITION (dim_date='18213') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18213'
   PARTITION (dim_date='18214') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversion$
/merge_on_read_aws_hudi_jar/18214'   PARTITION (dim_date='18215') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18215'
   PARTITION (dim_date='18216') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18216'
   PARTITION (dim_date='1$
217') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18217'
   PARTITION (dim_date='18218') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18218'
   PARTITION (dim_date='18219') LOCATION 
's3://feichi-test/fact_hourly_search_term_co$
versions/merge_on_read_aws_hudi_jar/18219'   PARTITION (dim_date='18220') 
LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18220'
   PARTITION (dim_date='18221') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18221'
   PARTITION (dim$
date='18222') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18222'
   PARTITION (dim_date='18223') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18223'
   PARTITION (dim_date='18224') LOCATION 's3://feichi-test/fact_hourly_search$
term_conversions/merge_on_read_aws_hudi_jar/18224'   PARTITION 
(dim_date='18225') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18225'
   PARTITION (dim_date='18226') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18226'
   PARTIT$
ON (dim_date='18227') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18227'
   PARTITION (dim_date='18228') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18228'
   PARTITION (dim_date='18229') LOCATION 's3://feichi-test/fact_hourl$
_search_term_conversions/merge_on_read_aws_hudi_jar/18229'   PARTITION 
(dim_date='18230') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18230'
   PARTITION (dim_date='18231') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18231'
 PARTITION (dim_date='18232') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18232'
   PARTITION (dim_date='18233') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18233'
   PARTITION (dim_date='18234') LOCATION 's3://feichi-test/fa$
t_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18234'   PARTITION 
(dim_date='18235') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18235'
   PARTITION (dim_date='18236') LOCATION 
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/
18236'
org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for 
table fact_hourly_search_term_conversions_hudi_mor_hudi_jar
  at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:177)
  at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:107)
  at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:71)
  at 
org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:236)
  at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169){noformat}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to