wombatu-kun commented on issue #13219: URL: https://github.com/apache/hudi/issues/13219#issuecomment-2834016064
So here are results for this test with different types of tables and indexes: ``` correct: COW any idx: [3,130,aa3], [5,151,oo], [6,161,oo], [7,170,aa7] MOR INMEMORY: [3,130,aa3], [5,151,oo], [6,161,oo], [7,170,aa7] incorrect: MOR BUCKET 1: [3,131,oo], [5,151,oo], [6,161,oo], [7,171,oo] MOR BUCKET 5: [3,131,oo], [5,151,oo], [6,161,oo], [7,170,aa7] MOR SIMPLE: [3,131,oo], [5,151,oo], [6,161,oo], [7,170,aa7] MOR BLOOM: [3,131,oo], [5,151,oo], [6,161,oo], [7,170,aa7] ``` As i understand it right: MERGE INTO logic decides about update/insert of every record by it's currentLocation, so, if HoodieIndex has somehow calculated currentLocation for a record, this record would go through "when matched" execution branch, no matter whether this record really exists or not. This logic is ok for common upsert operation, but as you can see, it's not suitable in case of merge into. @danny0405 @yihua @nsivabalan @jonvex @codope @vinothchandar how do you think, what is the best way to fix it? Please, give me some advice, what is the proper place in code to change this logic: indexes, sprksqlwriter, MergeInto command, somewhere else? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
