[
https://issues.apache.org/jira/browse/HUDI-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jing Zhang reassigned HUDI-6364:
--------------------------------
Assignee: Jing Zhang
> InsertOverwrite operation on consistent hashing resulting in wrong data
> -----------------------------------------------------------------------
>
> Key: HUDI-6364
> URL: https://issues.apache.org/jira/browse/HUDI-6364
> Project: Apache Hudi
> Issue Type: Bug
> Components: index
> Reporter: Jing Zhang
> Assignee: Jing Zhang
> Priority: Major
>
> {code:java}
> spark.sql(
> s"""insert into $tableName values
> |(5, 'a', 35, 1000, '2021-01-05'),
> |(1, 'a', 31, 1000, '2021-01-05'),
> |(3, 'a', 33, 1000, '2021-01-05'),
> |(4, 'b', 16, 1000, '2021-01-05'),
> |(2, 'b', 18, 1000, '2021-01-05'),
> |(6, 'b', 17, 1000, '2021-01-05'),
> |(8, 'a', 21, 1000, '2021-01-05'),
> |(9, 'a', 22, 1000, '2021-01-05'),
> |(7, 'a', 23, 1000, '2021-01-05')
> |""".stripMargin)
> // Insert overwrite static partition
> spark.sql(
> s"""
> | insert overwrite table $tableName partition(dt = '2021-01-05')
> | select * from (select 13 , 'a2', 12, 1000) limit 10
> """.stripMargin)
> spark.sql(
> s"""
> | insert into $tableName values
> | (5, 'a3', 35, 1000, '2021-01-05'),
> | (3, 'a3', 33, 1000, '2021-01-05')
> """.stripMargin)
> {code}
> After running the above case, we expect the result of the snapshot would be
> (13, "a3", 12.0, 1000, "2021-01-05"), (5, "a3", 35, 1000, "2021-01-05"), (3,
> "a3", 33, 1000, "2021-01-05").
> But the actual result is (13,a2,12.0,1000,2021-01-05).
> The root cause is that after running insert overwrite into a consistent
> bucket index, the file groups in consistent_hashing_metadata does not match
> file groups on storage any more.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)