[GitHub] [hudi] ad1happy2go commented on issue #9469: [SUPPORT] Exception when using MERGE INTO

via GitHub Tue, 22 Aug 2023 11:03:10 -0700


ad1happy2go commented on issue #9469:
URL: https://github.com/apache/hudi/issues/9469#issuecomment-1688671086


   @praneethh I am able to reproduce the issue. Below is the code I used - 
   
   ```
   val tableName = "issue_9469_53"
   val path = s"file:///tmp/${tableName}"
   val df = Seq(("1","neo","1","1","2023-08-04")).toDF("emp_id", "emp_name", 
"log_ts", "load_ts", "log_dt")
   
   
df.select(col("emp_id").cast("int"),col("emp_name").cast("string"),col("log_ts").cast("int"),col("load_ts").cast("int"),col("log_dt").cast("date")).write.format("hudi")
   .option("hoodie.datasource.write.recordkey.field", "emp_id")
   .option("hoodie.datasource.write.partitionpath.field", "log_dt")
   .option("hoodie.index.type","GLOBAL_SIMPLE")
   .option("hoodie.table.name", tableName)
   .option("hoodie.simple.index.update.partition.path", "false")
   .option("hoodie.datasource.write.precombine.field", "load_ts")
   .option("hoodie.datasource.hive_sync.enable","true")
   .option("hoodie.datasource.hive_sync.database","default")
   .option("hoodie.datasource.hive_sync.table", tableName)
   .option("hoodie.datasource.hive_sync.partition_fields", "log_dt")
   .option("hoodie.datasource.hive_sync.ignore_exceptions", "true")
   .option("hoodie.datasource.hive_sync.mode", "hms")
   .option("hoodie.datasource.hive_sync.use_jdbc", "false")
   .option("hoodie.datasource.write.operation","upsert")
   .mode("append")
   .save(path)
   
   val df2 = 
Seq(("1","neo","2","2","2023-08-05"),("2","trinity","2","2","2023-08-05")).toDF("emp_id",
 "emp_name", "log_ts","load_ts","log_dt")
   
   val df3 = 
df2.select(col("emp_id").cast("int"),col("emp_name").cast("string"),col("log_ts").cast("int"),col("load_ts").cast("int"),col("log_dt").cast("date"))
   
   df3.createOrReplaceTempView("incremental_data")
   
   val sqlPartialUpdate =
              s"""
                | merge into ${tableName} as target
                | using (
               |   select * from incremental_data
               | ) source
               | on  target.emp_id = source.emp_id
               | when matched then
               |   update set target.log_ts = source.log_ts, target.log_dt = 
source.log_dt, target.load_ts = source.load_ts
               | when not matched then insert *
               """.stripMargin
   
   spark.sql(sqlPartialUpdate)
   ```
   
   Created the JIRA (https://issues.apache.org/jira/browse/HUDI-6737) for the 
same and will be working on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] ad1happy2go commented on issue #9469: [SUPPORT] Exception when using MERGE INTO

Reply via email to