[I] InsertOverwrite operation on consistent hashing resulting in wrong data [hudi]

via GitHub Sun, 30 Nov 2025 00:07:19 -0800


hudi-bot opened a new issue, #16021:
URL: https://github.com/apache/hudi/issues/16021


   {code:java}
   spark.sql(
     s"""insert into $tableName  values
        |(5, 'a', 35, 1000, '2021-01-05'),
        |(1, 'a', 31, 1000, '2021-01-05'),
        |(3, 'a', 33, 1000, '2021-01-05'),
        |(4, 'b', 16, 1000, '2021-01-05'),
        |(2, 'b', 18, 1000, '2021-01-05'),
        |(6, 'b', 17, 1000, '2021-01-05'),
        |(8, 'a', 21, 1000, '2021-01-05'),
        |(9, 'a', 22, 1000, '2021-01-05'),
        |(7, 'a', 23, 1000, '2021-01-05')
        |""".stripMargin)
   
   // Insert overwrite static partition
   spark.sql(
     s"""
        | insert overwrite table $tableName partition(dt = '2021-01-05')
        | select * from (select 13 , 'a2', 12, 1000) limit 10
   """.stripMargin)
   
   spark.sql(
     s"""
        | insert into $tableName values
        | (5,  'a3', 35, 1000, '2021-01-05'),
        | (3, 'a3', 33, 1000, '2021-01-05')
         """.stripMargin)
   {code}
   After running the above case, we expect the result of the snapshot would be 
(13, "a3", 12.0, 1000, "2021-01-05"), (5, "a3", 35, 1000, "2021-01-05"), (3, 
"a3", 33, 1000, "2021-01-05"). 
   
   But the actual result is (13,a2,12.0,1000,2021-01-05).
   
   The root cause is that after running insert overwrite into a consistent 
bucket index, the  file groups in consistent_hashing_metadata does not match 
file groups on storage any more.
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-6364
   - Type: Bug


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] InsertOverwrite operation on consistent hashing resulting in wrong data [hudi]

Reply via email to