Re: [PR] [HUDI-7236] Allow MIT to change partition path when using global index [hudi]

via GitHub Mon, 16 Sep 2024 08:47:53 -0700


jonvex commented on code in PR #10337:
URL: https://github.com/apache/hudi/pull/10337#discussion_r1761412060



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/TestMergeIntoTable.scala:
##########
@@ -263,6 +340,137 @@ class TestMergeIntoTable extends HoodieSparkSqlTestBase 
with ScalaAssertionSuppo
     })
   }
 
+  test("Test MergeInto with changing partition and global index") {
+    withRecordType()(withTempDir { tmp =>
+      withSQLConf("hoodie.index.type" -> "GLOBAL_SIMPLE") {
+        Seq("cow","mor").foreach { tableType => {
+          val sourceTable = generateTableName
+          val targetTable = generateTableName
+          spark.sql(
+            s"""
+               | create table $sourceTable
+               | using parquet
+               | partitioned by (partition)
+               | location '${tmp.getCanonicalPath}/$sourceTable'
+               | as
+               | select
+               | 1 as id,
+               | 2 as version,
+               | 'yes' as mergeCond,
+               | '2023-10-02' as partition
+            """.stripMargin
+          )
+          spark.sql(s"insert into $sourceTable values(2, 2, 'no', 
'2023-10-02')")
+          spark.sql(s"insert into $sourceTable values(3, 1, 'insert', 
'2023-10-01')")
+
+          spark.sql(
+            s"""
+               | create table $targetTable (
+               |  id int,
+               |  version int,
+               |  mergeCond string,
+               |  partition string
+               | ) using hudi
+               | partitioned by (partition)
+               | tblproperties (
+               |    'primaryKey' = 'id',
+               |    'type' = '$tableType',
+               |    'payloadClass' = 
'org.apache.hudi.common.model.DefaultHoodieRecordPayload',
+               |    'payloadType' = 'CUSTOM',
+               |    preCombineField = 'version'
+               | )
+               | location '${tmp.getCanonicalPath}/$targetTable'
+             """.stripMargin)
+
+          spark.sql(s"insert into $targetTable values(1, 1, 'insert', 
'2023-10-01')")
+          spark.sql(s"insert into $targetTable values(2, 3, 'insert', 
'2023-10-01')")
+
+          spark.sql(
+            s"""
+               | merge into $targetTable t using
+               | (select * from $sourceTable) as s
+               | on t.id=s.id
+               | when matched and s.mergeCond = 'yes' then update set *
+               | when not matched then insert *
+             """.stripMargin)
+          checkAnswer(s"select id,version,_hoodie_partition_path from 
$targetTable order by id")(
+            Seq(1, 2, "partition=2023-10-02"),

Review Comment:
   yeah that is confusing. The source table is created with:
   ```
             spark.sql(
               s"""
                  | create table $sourceTable
                  | using parquet
                  | partitioned by (partition)
                  | location '${tmp.getCanonicalPath}/$sourceTable'
                  | as
                  | select
                  | 1 as id,
                  | 2 as version,
                  | 'yes' as mergeCond,
                  | '2023-10-02' as partition
               """.stripMargin
             )
   ```
   so that is where that record is coming from



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-7236] Allow MIT to change partition path when using global index [hudi]

Reply via email to