[
https://issues.apache.org/jira/browse/HUDI-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bhavani Sudha updated HUDI-607:
-------------------------------
Fix Version/s: 0.5.3
> Hive sync fails to register tables partitioned by Date Type column
> ------------------------------------------------------------------
>
> Key: HUDI-607
> URL: https://issues.apache.org/jira/browse/HUDI-607
> Project: Apache Hudi (incubating)
> Issue Type: Bug
> Components: Hive Integration
> Reporter: Udit Mehrotra
> Assignee: Udit Mehrotra
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.6.0, 0.5.3
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> h2. Issue Description
> As part of spark to avro conversion, Spark's *Date* type is represented as
> corresponding *Date Logical Type* in Avro, which is underneath represented in
> Avro by physical *Integer* type. For this reason when forming the Avro
> records from Spark rows, it is converted to corresponding *Epoch day* to be
> stored as corresponding *Integer* value in the parquet files.
> However, this manifests into a problem that when a *Date Type* column is
> chosen as partition column. In this case, Hudi's partition column
> *_hoodie_partition_path* also gets the corresponding *epoch day integer*
> value when reading the partition field from the avro record, and as a result
> syncing partitions in hudi table issues a command like the following, where
> the date is an integer:
> {noformat}
> ALTER TABLE uditme_hudi.uditme_hudi_events_cow_feb05_00 ADD IF NOT EXISTS
> PARTITION (event_date='17897') LOCATION
> 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17897'
> PARTITION (event_date='17898') LOCATION
> 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17898'
> PARTITION (event_date='17899') LOCATION
> 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17899'
> PARTITION (event_date='17900') LOCATION
> 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17900'{noformat}
> Hive is not able to make sense of the partition field values like *17897* as
> it is not able to convert it to corresponding date from this string. It
> actually expects the actual date to be represented in string form.
> So, we need to make sure that Hudi's partition field gets the actual date
> value in string form, instead of the integer. This change makes sure that
> when a fields value is retrieved from the Avro record, we check that if its
> *Date Logical Type* we return the actual date value, instead of the epoch.
> After this change the command for sync partitions issues is like:
> {noformat}
> ALTER TABLE `uditme_hudi`.`uditme_hudi_events_cow_feb05_01` ADD IF NOT EXISTS
> PARTITION (`event_date`='2019-01-01') LOCATION
> 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-01'
> PARTITION (`event_date`='2019-01-02') LOCATION
> 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-02'
> PARTITION (`event_date`='2019-01-03') LOCATION
> 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-03'
> PARTITION (`event_date`='2019-01-04') LOCATION
> 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-04'{noformat}
> h2. Stack Trace
> {noformat}
> 20/01/13 23:28:04 INFO HoodieHiveClient: Last commit time synced is not
> known, listing all partitions in
> s3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar,FS
> :com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem@1f0c8e1f
> 20/01/13 23:28:08 INFO HiveSyncTool: Storage partitions scan complete. Found
> 31
> 20/01/13 23:28:08 INFO HiveSyncTool: New Partitions [18206, 18207, 18208,
> 18209, 18210, 18211, 18212, 18213, 18214, 18215, 18216, 18217, 18218, 18219,
> 18220, 18221, 18222, 18223, 18224, 18225, 18226, 18227, 18228, 18229, 18230,
> 18231, 18232, 18233, 18234, 18235, 18236]
> 20/01/13 23:28:08 INFO HoodieHiveClient: Adding partitions 31 to table
> fact_hourly_search_term_conversions_hudi_mor_hudi_jar
> 20/01/13 23:28:08 INFO HoodieHiveClient: Executing SQL ALTER TABLE
> default.fact_hourly_search_term_conversions_hudi_mor_hudi_jar ADD IF NOT
> EXISTS PARTITION (dim_date='18206') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18206'
> PARTITION (dim_date='18207') LOCATION $
> s3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18207'
> PARTITION (dim_date='18208') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18208'
> PARTITION (dim_date='18209') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_$
> n_read_aws_hudi_jar/18209' PARTITION (dim_date='18210') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18210'
> PARTITION (dim_date='18211') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18211'
> PARTITION (dim_date='18212') L$
> CATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18212'
> PARTITION (dim_date='18213') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18213'
> PARTITION (dim_date='18214') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversion$
> /merge_on_read_aws_hudi_jar/18214' PARTITION (dim_date='18215') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18215'
> PARTITION (dim_date='18216') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18216'
> PARTITION (dim_date='1$
> 217') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18217'
> PARTITION (dim_date='18218') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18218'
> PARTITION (dim_date='18219') LOCATION
> 's3://feichi-test/fact_hourly_search_term_co$
> versions/merge_on_read_aws_hudi_jar/18219' PARTITION (dim_date='18220')
> LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18220'
> PARTITION (dim_date='18221') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18221'
> PARTITION (dim$
> date='18222') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18222'
> PARTITION (dim_date='18223') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18223'
> PARTITION (dim_date='18224') LOCATION 's3://feichi-test/fact_hourly_search$
> term_conversions/merge_on_read_aws_hudi_jar/18224' PARTITION
> (dim_date='18225') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18225'
> PARTITION (dim_date='18226') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18226'
> PARTIT$
> ON (dim_date='18227') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18227'
> PARTITION (dim_date='18228') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18228'
> PARTITION (dim_date='18229') LOCATION 's3://feichi-test/fact_hourl$
> _search_term_conversions/merge_on_read_aws_hudi_jar/18229' PARTITION
> (dim_date='18230') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18230'
> PARTITION (dim_date='18231') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18231'
> PARTITION (dim_date='18232') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18232'
> PARTITION (dim_date='18233') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18233'
> PARTITION (dim_date='18234') LOCATION 's3://feichi-test/fa$
> t_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18234'
> PARTITION (dim_date='18235') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18235'
> PARTITION (dim_date='18236') LOCATION
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/
> 18236'
> org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for
> table fact_hourly_search_term_conversions_hudi_mor_hudi_jar
> at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:177)
> at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:107)
> at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:71)
> at
> org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:236)
> at
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169){noformat}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)