nsivabalan commented on issue #5869: URL: https://github.com/apache/hudi/issues/5869#issuecomment-1527003611
From what I can glean from the description, looks like the query is a RO query and update partition path is set to true. So, w/ 2nd commit, the delete record went to a log file in partition creation_date=2015-01-01, while the new insert for same record key (100), went to new partition creation_date=2015-01-02. hence RO query will return dups. If you trigger compaction, this should be resolved. this is a known limitation for RO query. Also, if you prefer not to update the partition path, for eg, for record with record key 100, if you wish to retain the record in partition 2015-01-01 itself, you should set `hoodie.bloom.index.update.partition.path` = false. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
