hudi-bot opened a new issue, #16021:
URL: https://github.com/apache/hudi/issues/16021
{code:java}
spark.sql(
s"""insert into $tableName values
|(5, 'a', 35, 1000, '2021-01-05'),
|(1, 'a', 31, 1000, '2021-01-05'),
|(3, 'a', 33, 1000, '2021-01-05'),
|(4, 'b', 16, 1000, '2021-01-05'),
|(2, 'b', 18, 1000, '2021-01-05'),
|(6, 'b', 17, 1000, '2021-01-05'),
|(8, 'a', 21, 1000, '2021-01-05'),
|(9, 'a', 22, 1000, '2021-01-05'),
|(7, 'a', 23, 1000, '2021-01-05')
|""".stripMargin)
// Insert overwrite static partition
spark.sql(
s"""
| insert overwrite table $tableName partition(dt = '2021-01-05')
| select * from (select 13 , 'a2', 12, 1000) limit 10
""".stripMargin)
spark.sql(
s"""
| insert into $tableName values
| (5, 'a3', 35, 1000, '2021-01-05'),
| (3, 'a3', 33, 1000, '2021-01-05')
""".stripMargin)
{code}
After running the above case, we expect the result of the snapshot would be
(13, "a3", 12.0, 1000, "2021-01-05"), (5, "a3", 35, 1000, "2021-01-05"), (3,
"a3", 33, 1000, "2021-01-05").
But the actual result is (13,a2,12.0,1000,2021-01-05).
The root cause is that after running insert overwrite into a consistent
bucket index, the file groups in consistent_hashing_metadata does not match
file groups on storage any more.
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-6364
- Type: Bug
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]