[GitHub] [hudi] praneethh opened a new issue, #9469: [SUPPORT] Exception when using MERGE INTO

via GitHub Thu, 17 Aug 2023 12:44:06 -0700


praneethh opened a new issue, #9469:
URL: https://github.com/apache/hudi/issues/9469


   I'm trying to use merge into and perform partial update on the target data 
but getting the following error:
   
   ```
   java.lang.UnsupportedOperationException: MERGE INTO TABLE is not supported 
temporarily.
     at 
org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:718)
     at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63)
     at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
     at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
     at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
     at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
     at 
org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:67)
     at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78)
   ```
   
   Steps to reproduce:
   
   1) Load the target table
   
   ```
    val df = Seq(("1","neo","2023-08-04 12:00:00","2023-08-04 
12:00:00","2023-08-04")).toDF("emp_id", "emp_name", "log_ts", "load_ts", 
"log_dt")
   
   
df.select(col("emp_id").cast("int"),col("emp_name").cast("string"),col("log_ts").cast("timestamp"),col("load_ts").cast("timestamp"),col("log_dt").cast("date"))
   
   res0.write.format("hudi")
   .option("hoodie.payload.ordering.field", "load_ts")
   .option("hoodie.datasource.write.recordkey.field", "emp_id")
   .option("hoodie.datasource.write.partitionpath.field", "log_dt")
   .option("hoodie.index.type","GLOBAL_SIMPLE")
   .option("hoodie.table.name", "hudi_test")
   .option("hoodie.simple.index.update.partition.path", "false")
   .option("hoodie.datasource.write.precombine.field", "load_ts")
   
.option("hoodie.datasource.write.payload.class","org.apache.hudi.common.model.PartialUpdateAvroPayload")
   .option("hoodie.datasource.write.reconcile.schema","true")
   .option("hoodie.schema.on.read.enable","true")
   .option("hoodie.datasource.write.hive_style_partitioning", "true")
   .option("hoodie.datasource.write.row.writer.enable","false")
   .option("hoodie.datasource.hive_sync.enable","true")
   .option("hoodie.datasource.hive_sync.database","pharpan")
   .option("hoodie.datasource.hive_sync.table", "hudi_test")
   .option("hoodie.datasource.hive_sync.partition_fields", "partitionId")
   .option("hoodie.datasource.hive_sync.ignore_exceptions", "true")
   .option("hoodie.datasource.hive_sync.mode", "hms")
   .option("hoodie.datasource.hive_sync.use_jdbc", "false")
   .option("hoodie.datasource.write.operation","upsert")
   .mode("append")
   .save("gs://sample_bucket/hudi_sample_output_data") 
   ```
   
   2) Load the incremental data
   
   ```
   
   val df2 = Seq(("1","neo","2023-08-05 14:00:00","2023-08-04 
12:00:00","2023-08-05"),("2","trinity","2023-08-05 14:00:00","2023-08-05 
15:00:00","2023-08-05")).toDF("emp_id", "emp_name", "log_ts","load_ts","log_dt")
   
   
df2.select(col("emp_id").cast("int"),col("emp_name").cast("string"),col("log_ts").cast("timestamp"),col("load_ts").cast("timestamp"),col("log_dt").cast("date"))
   
   res2.createOrReplaceTempView("incremental_data")
   
   ```
   
   3) Perform merge
   
   ```
   val sqlPartialUpdate =
              s"""
                | merge into pharpan.hudi_test as target
                | using (
               |   select * from incremental_data
               | ) source
               | on  target.emp_id = source.emp_id
               | when matched then
               |   update set target.log_ts = source.log_ts, target.log_dt = 
source.log_dt
               | when not matched then insert *
               """.stripMargin
   
   spark.sql(sqlPartialUpdate)
   
   ```
   
   Hudi verison: 0.13.1
   Using "org.apache.hudi.common.model.PartialUpdateAvroPayload" for partial 
update.
   
   Can someone please help in resolving this error? Also, please share the 
documentation on using MERGE INTO if I'm using it in the wrong way.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] praneethh opened a new issue, #9469: [SUPPORT] Exception when using MERGE INTO

Reply via email to