Udit Mehrotra created HUDI-607:
----------------------------------
Summary: Hive sync fails to register tables partitioned by Date
Type column
Key: HUDI-607
URL: https://issues.apache.org/jira/browse/HUDI-607
Project: Apache Hudi (incubating)
Issue Type: Bug
Components: Hive Integration
Reporter: Udit Mehrotra
h2. Issue Description
As part of spark to avro conversion, Spark's *Date* type is represented as
corresponding *Date Logical Type* in Avro, which is underneath represented in
Avro by physical *Integer* type. For this reason when forming the Avro records
from Spark rows, it is converted to corresponding *Epoch day* to be stored as
corresponding *Integer* value in the parquet files.
However, this manifests into a problem that when a *Date Type* column is chosen
as partition column. In this case, Hudi's partition column
*_hoodie_partition_path* also gets the corresponding *epoch day integer* value
when reading the partition field from the avro record, and as a result syncing
partitions in hudi table issues a command like the following, where the date is
an integer:
{noformat}
ALTER TABLE uditme_hudi.uditme_hudi_events_cow_feb05_00 ADD IF NOT EXISTS
PARTITION (event_date='17897') LOCATION
's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17897'
PARTITION (event_date='17898') LOCATION
's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17898'
PARTITION (event_date='17899') LOCATION
's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17899'
PARTITION (event_date='17900') LOCATION
's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17900'{noformat}
Hive is not able to make sense of the partition field values like *17897* as it
is not able to convert it to corresponding date from this string. It actually
expects the actual date to be represented in string form.
So, we need to make sure that Hudi's partition field gets the actual date value
in string form, instead of the integer. This change makes sure that when a
fields value is retrieved from the Avro record, we check that if its *Date
Logical Type* we return the actual date value, instead of the epoch. After this
change the command for sync partitions issues is like:
{noformat}
ALTER TABLE `uditme_hudi`.`uditme_hudi_events_cow_feb05_01` ADD IF NOT EXISTS
PARTITION (`event_date`='2019-01-01') LOCATION
's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-01'
PARTITION (`event_date`='2019-01-02') LOCATION
's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-02'
PARTITION (`event_date`='2019-01-03') LOCATION
's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-03'
PARTITION (`event_date`='2019-01-04') LOCATION
's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-04'{noformat}
h2. Stack Trace
{noformat}
20/01/13 23:28:04 INFO HoodieHiveClient: Last commit time synced is not known,
listing all partitions in
s3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar,FS
:com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem@1f0c8e1f
20/01/13 23:28:08 INFO HiveSyncTool: Storage partitions scan complete. Found 31
20/01/13 23:28:08 INFO HiveSyncTool: New Partitions [18206, 18207, 18208,
18209, 18210, 18211, 18212, 18213, 18214, 18215, 18216, 18217, 18218, 18219,
18220, 18221, 18222, 18223, 18224, 18225, 18226, 18227, 18228, 18229, 18230,
18231, 18232, 18233, 18234, 18235, 18236]
20/01/13 23:28:08 INFO HoodieHiveClient: Adding partitions 31 to table
fact_hourly_search_term_conversions_hudi_mor_hudi_jar
20/01/13 23:28:08 INFO HoodieHiveClient: Executing SQL ALTER TABLE
default.fact_hourly_search_term_conversions_hudi_mor_hudi_jar ADD IF NOT EXISTS
PARTITION (dim_date='18206') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18206'
PARTITION (dim_date='18207') LOCATION $
s3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18207'
PARTITION (dim_date='18208') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18208'
PARTITION (dim_date='18209') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_$
n_read_aws_hudi_jar/18209' PARTITION (dim_date='18210') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18210'
PARTITION (dim_date='18211') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18211'
PARTITION (dim_date='18212') L$
CATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18212'
PARTITION (dim_date='18213') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18213'
PARTITION (dim_date='18214') LOCATION
's3://feichi-test/fact_hourly_search_term_conversion$
/merge_on_read_aws_hudi_jar/18214' PARTITION (dim_date='18215') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18215'
PARTITION (dim_date='18216') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18216'
PARTITION (dim_date='1$
217') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18217'
PARTITION (dim_date='18218') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18218'
PARTITION (dim_date='18219') LOCATION
's3://feichi-test/fact_hourly_search_term_co$
versions/merge_on_read_aws_hudi_jar/18219' PARTITION (dim_date='18220')
LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18220'
PARTITION (dim_date='18221') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18221'
PARTITION (dim$
date='18222') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18222'
PARTITION (dim_date='18223') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18223'
PARTITION (dim_date='18224') LOCATION 's3://feichi-test/fact_hourly_search$
term_conversions/merge_on_read_aws_hudi_jar/18224' PARTITION
(dim_date='18225') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18225'
PARTITION (dim_date='18226') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18226'
PARTIT$
ON (dim_date='18227') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18227'
PARTITION (dim_date='18228') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18228'
PARTITION (dim_date='18229') LOCATION 's3://feichi-test/fact_hourl$
_search_term_conversions/merge_on_read_aws_hudi_jar/18229' PARTITION
(dim_date='18230') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18230'
PARTITION (dim_date='18231') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18231'
PARTITION (dim_date='18232') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18232'
PARTITION (dim_date='18233') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18233'
PARTITION (dim_date='18234') LOCATION 's3://feichi-test/fa$
t_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18234' PARTITION
(dim_date='18235') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18235'
PARTITION (dim_date='18236') LOCATION
's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/
18236'
org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for
table fact_hourly_search_term_conversions_hudi_mor_hudi_jar
at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:177)
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:107)
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:71)
at
org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:236)
at
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169){noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)