[ 
https://issues.apache.org/jira/browse/HUDI-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-8311:
--------------------------------------
    Fix Version/s: 1.0.2

> Support YYYY/MM/DD partition format with hive
> ---------------------------------------------
>
>                 Key: HUDI-8311
>                 URL: https://issues.apache.org/jira/browse/HUDI-8311
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Lokesh Jain
>            Assignee: Lokesh Jain
>            Priority: Critical
>             Fix For: 1.0.1, 1.0.2
>
>
> Currently a format like YYYY/MM/DD fails when syncing with hive. The Jira 
> aims to add a fix so that such a format is supported.
> Steps to reproduce: The table created below uses a custom keygen with 
> combination of simple and timestamp keygen. Timestamp keygen produces an 
> output of format - YYYY/MM/DD
> {code:java}
> import org.apache.hudi.HoodieSparkUtils
> import org.apache.hudi.common.config.TypedProperties
> import org.apache.hudi.common.util.StringUtils
> import org.apache.hudi.exception.HoodieException
> import org.apache.hudi.functional.TestSparkSqlWithCustomKeyGenerator._
> import org.apache.hudi.testutils.HoodieClientTestUtils.createMetaClient
> import org.apache.hudi.util.SparkKeyGenUtilsimport 
> org.apache.spark.sql.SaveMode
> import org.apache.spark.sql.hudi.common.HoodieSparkSqlTestBase
> import org.joda.time.DateTime
> import org.joda.time.format.DateTimeFormat
> import org.junit.jupiter.api.Assertions.{assertEquals, assertFalse, 
> assertTrue}
> import org.slf4j.LoggerFactory
>     val df = spark.sql(
>       s"""SELECT 1 as id, 'a1' as name, 1.6 as price, 1704121827 as ts, 
> 'cat1' as segment
>          | UNION
>          | SELECT 2 as id, 'a2' as name, 10.8 as price, 1704121827 as ts, 
> 'cat1' as segment
>          | UNION
>          | SELECT 3 as id, 'a3' as name, 30.0 as price, 1706800227 as ts, 
> 'cat1' as segment
>          | UNION
>          | SELECT 4 as id, 'a4' as name, 103.4 as price, 1701443427 as ts, 
> 'cat2' as segment
>          | UNION
>          | SELECT 5 as id, 'a5' as name, 1999.0 as price, 1704121827 as ts, 
> 'cat2' as segment
>          | UNION
>          | SELECT 6 as id, 'a6' as name, 80.0 as price, 1704121827 as ts, 
> 'cat3' as segment
>          |""".stripMargin)    
> df.write.format("hudi").option("hoodie.datasource.write.table.type", 
> "MERGE_ON_READ").option("hoodie.datasource.write.keygenerator.class", 
> "org.apache.hudi.keygen.CustomAvroKeyGenerator").option("hoodie.datasource.write.partitionpath.field",
>  
> "segment:simple,ts:timestamp").option("hoodie.datasource.write.recordkey.field",
>  "id").option("hoodie.datasource.write.precombine.field", 
> "name").option("hoodie.table.name", 
> "hudi_table_2").option("hoodie.insert.shuffle.parallelism", 
> "1").option("hoodie.upsert.shuffle.parallelism", 
> "1").option("hoodie.bulkinsert.shuffle.parallelism", 
> "1").option("hoodie.keygen.timebased.timestamp.type", 
> "SCALAR").option("hoodie.keygen.timebased.output.dateformat", 
> "yyyy/MM/DD").option("hoodie.keygen.timebased.timestamp.scalar.time.unit", 
> "seconds").mode(SaveMode.Overwrite).save("/user/hive/warehouse/hudi_table_2") 
> // Sync with hive
> /var/hoodie/ws/hudi-sync/hudi-hive-sync/run_sync_tool.sh \
>   --jdbc-url jdbc:hive2://hiveserver:10000 \
>   --user hive \
>   --pass hive \
>   --partitioned-by segment,ts \
>   --base-path /user/hive/warehouse/hudi_table_2 \
>   --database default \
>   --table hudi_table_2 \
>   --partition-value-extractor 
> org.apache.hudi.hive.MultiPartKeysValueExtractor    {code}
>  
> Hive creation fails now.
> {code:java}
> 2024-10-06 14:33:44,200 INFO  [main] hive.metastore 
> (HiveMetaStoreClient.java:close(564)) - Closed a connection to metastore, 
> current connections: 0
> Exception in thread "main" org.apache.hudi.exception.HoodieException: Got 
> runtime exception when hive syncing hudi_table_2
>     at 
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:180)
>     at org.apache.hudi.hive.HiveSyncTool.main(HiveSyncTool.java:547)
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: failed to sync the 
> table hudi_table_2_ro
>     at 
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:272)
>     at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:203)
>     at 
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:177)
>     ... 1 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
> partitions for table hudi_table_2_ro
>     at 
> org.apache.hudi.hive.HiveSyncTool.syncAllPartitions(HiveSyncTool.java:474)
>     at 
> org.apache.hudi.hive.HiveSyncTool.validateAndSyncPartitions(HiveSyncTool.java:321)
>     at 
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:261)
>     ... 3 more
> Caused by: java.lang.IllegalArgumentException: Partition key parts [segment, 
> ts] does not match with partition values [cat1, 2024, 01, 01]. Check 
> partition strategy.
>     at 
> org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:42)
>     at 
> org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.getPartitionClause(QueryBasedDDLExecutor.java:191)
>     at 
> org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.constructAddPartitions(QueryBasedDDLExecutor.java:164)
>     at 
> org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.addPartitionsToTable(QueryBasedDDLExecutor.java:124)
>     at 
> org.apache.hudi.hive.HoodieHiveSyncClient.addPartitionsToTable(HoodieHiveSyncClient.java:118)
>     at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:516)
>     at 
> org.apache.hudi.hive.HiveSyncTool.syncAllPartitions(HiveSyncTool.java:470)
>     ... 5 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to