cshuo opened a new issue, #17641:
URL: https://github.com/apache/hudi/issues/17641

   ### Bug Description
   
   **What happened:**
   When the ordering field is of String type, and there is a record marked as 
delete record by `_hoodie_is_deleted` = true, the delete record will always be 
chosen during merging, regardless of the ordering value.
   
   **What you expected:**
   Delete record with smaller ordering field should not be chosen during 
merging.
   
   **Steps to reproduce:**
   Put the following test code in `ITTestHoodieDataSource`
   
   ```java
    @Test
     void testHardDelete() throws Exception {
       ExecMode execMode = ExecMode.BATCH;
       String hoodieTableDDL = "create table t1(\n"
           + "  uuid varchar(20),\n"
           + "  name varchar(10),\n"
           + "  age int,\n"
           + "  _hoodie_is_deleted boolean,\n"
           + "  `partition` varchar(20),\n"
           + "  ts STRING,\n"
           + "  PRIMARY KEY(uuid) NOT ENFORCED\n"
           + ")\n"
           + "PARTITIONED BY (`partition`)\n"
           + "with (\n"
           + "  'connector' = 'hudi',\n"
           + "  'table.type' = 'MERGE_ON_READ',\n"
           + "  'index.type' = 'BUCKET',\n"
           + "  'path' = '" + tempFile.getAbsolutePath() + "',\n"
           + "  'read.streaming.skip_compaction' = 'false'\n"
           + ")";
       batchTableEnv.executeSql(hoodieTableDDL);
   
       // first commit
       String insertInto = "insert into t1 values\n"
           + "('id1','Danny',23,false,'par1', '101'),\n"
           + "('id2','Stephen',33,false,'par1', '103')";
       execInsertSql(batchTableEnv, insertInto);
   
       final String expected = "["
           + "+I[id1, Danny, 23, false, par1, 101], "
           + "+I[id2, Stephen, 33, false, par1, 103]]";
   
       // second commit, hard delete record with smaller order value
       insertInto = "insert into t1 values\n"
           + "('id2','Stephen',33, true,'par1', '102')";
       execInsertSql(batchTableEnv, insertInto);
       List<Row> result2 =  execSelectSql(batchTableEnv, "select * from t1", 
execMode);
       // no record is deleted.
       assertRowsEquals(result2, expected);
     }
   ```
   
   
   ### Environment
   
   **Hudi version:** 0.14.1, 0.15.0
   **Query engine:** (Spark/Flink/Trino etc) flink 1.20
   **Relevant configs:**
   
   
   ### Logs and Stack Trace
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to