[ 
https://issues.apache.org/jira/browse/HUDI-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha updated HUDI-607:
-------------------------------
    Fix Version/s: 0.5.3

> Hive sync fails to register tables partitioned by Date Type column
> ------------------------------------------------------------------
>
>                 Key: HUDI-607
>                 URL: https://issues.apache.org/jira/browse/HUDI-607
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>          Components: Hive Integration
>            Reporter: Udit Mehrotra
>            Assignee: Udit Mehrotra
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.6.0, 0.5.3
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> h2. Issue Description
> As part of spark to avro conversion, Spark's *Date* type is represented as 
> corresponding *Date Logical Type* in Avro, which is underneath represented in 
> Avro by physical *Integer* type. For this reason when forming the Avro 
> records from Spark rows, it is converted to corresponding *Epoch day* to be 
> stored as corresponding *Integer* value in the parquet files.
> However, this manifests into a problem that when a *Date Type* column is 
> chosen as partition column. In this case, Hudi's partition column 
> *_hoodie_partition_path* also gets the corresponding *epoch day integer* 
> value when reading the partition field from the avro record, and as a result 
> syncing partitions in hudi table issues a command like the following, where 
> the date is an integer:
> {noformat}
> ALTER TABLE uditme_hudi.uditme_hudi_events_cow_feb05_00 ADD IF NOT EXISTS   
> PARTITION (event_date='17897') LOCATION 
> 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17897'
>    PARTITION (event_date='17898') LOCATION 
> 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17898'
>    PARTITION (event_date='17899') LOCATION 
> 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17899'
>    PARTITION (event_date='17900') LOCATION 
> 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_00/17900'{noformat}
> Hive is not able to make sense of the partition field values like *17897* as 
> it is not able to convert it to corresponding date from this string. It 
> actually expects the actual date to be represented in string form.
> So, we need to make sure that Hudi's partition field gets the actual date 
> value in string form, instead of the integer. This change makes sure that 
> when a fields value is retrieved from the Avro record, we check that if its 
> *Date Logical Type* we return the actual date value, instead of the epoch. 
> After this change the command for sync partitions issues is like:
> {noformat}
> ALTER TABLE `uditme_hudi`.`uditme_hudi_events_cow_feb05_01` ADD IF NOT EXISTS 
>   PARTITION (`event_date`='2019-01-01') LOCATION 
> 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-01'
>    PARTITION (`event_date`='2019-01-02') LOCATION 
> 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-02'
>    PARTITION (`event_date`='2019-01-03') LOCATION 
> 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-03'
>    PARTITION (`event_date`='2019-01-04') LOCATION 
> 's3://emr-users/uditme/hudi/tables/events/uditme_hudi_events_cow_feb05_01/2019-01-04'{noformat}
> h2. Stack Trace
> {noformat}
> 20/01/13 23:28:04 INFO HoodieHiveClient: Last commit time synced is not 
> known, listing all partitions in 
> s3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar,FS
>  :com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem@1f0c8e1f
> 20/01/13 23:28:08 INFO HiveSyncTool: Storage partitions scan complete. Found 
> 31
> 20/01/13 23:28:08 INFO HiveSyncTool: New Partitions [18206, 18207, 18208, 
> 18209, 18210, 18211, 18212, 18213, 18214, 18215, 18216, 18217, 18218, 18219, 
> 18220, 18221, 18222, 18223, 18224, 18225, 18226, 18227, 18228, 18229, 18230, 
> 18231, 18232, 18233, 18234, 18235, 18236]
> 20/01/13 23:28:08 INFO HoodieHiveClient: Adding partitions 31 to table 
> fact_hourly_search_term_conversions_hudi_mor_hudi_jar
> 20/01/13 23:28:08 INFO HoodieHiveClient: Executing SQL ALTER TABLE 
> default.fact_hourly_search_term_conversions_hudi_mor_hudi_jar ADD IF NOT 
> EXISTS   PARTITION (dim_date='18206') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18206'
>    PARTITION (dim_date='18207') LOCATION $
> s3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18207'
>    PARTITION (dim_date='18208') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18208'
>    PARTITION (dim_date='18209') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_$
> n_read_aws_hudi_jar/18209'   PARTITION (dim_date='18210') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18210'
>    PARTITION (dim_date='18211') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18211'
>    PARTITION (dim_date='18212') L$
> CATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18212'
>    PARTITION (dim_date='18213') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18213'
>    PARTITION (dim_date='18214') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversion$
> /merge_on_read_aws_hudi_jar/18214'   PARTITION (dim_date='18215') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18215'
>    PARTITION (dim_date='18216') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18216'
>    PARTITION (dim_date='1$
> 217') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18217'
>    PARTITION (dim_date='18218') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18218'
>    PARTITION (dim_date='18219') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_co$
> versions/merge_on_read_aws_hudi_jar/18219'   PARTITION (dim_date='18220') 
> LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18220'
>    PARTITION (dim_date='18221') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18221'
>    PARTITION (dim$
> date='18222') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18222'
>    PARTITION (dim_date='18223') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18223'
>    PARTITION (dim_date='18224') LOCATION 's3://feichi-test/fact_hourly_search$
> term_conversions/merge_on_read_aws_hudi_jar/18224'   PARTITION 
> (dim_date='18225') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18225'
>    PARTITION (dim_date='18226') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18226'
>    PARTIT$
> ON (dim_date='18227') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18227'
>    PARTITION (dim_date='18228') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18228'
>    PARTITION (dim_date='18229') LOCATION 's3://feichi-test/fact_hourl$
> _search_term_conversions/merge_on_read_aws_hudi_jar/18229'   PARTITION 
> (dim_date='18230') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18230'
>    PARTITION (dim_date='18231') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18231'
>  PARTITION (dim_date='18232') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18232'
>    PARTITION (dim_date='18233') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18233'
>    PARTITION (dim_date='18234') LOCATION 's3://feichi-test/fa$
> t_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18234'   
> PARTITION (dim_date='18235') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/18235'
>    PARTITION (dim_date='18236') LOCATION 
> 's3://feichi-test/fact_hourly_search_term_conversions/merge_on_read_aws_hudi_jar/
> 18236'
> org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for 
> table fact_hourly_search_term_conversions_hudi_mor_hudi_jar
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:177)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:107)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:71)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:236)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169){noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to