Hfal91 opened a new issue, #13153:
URL: https://github.com/apache/hudi/issues/13153
**Describe the problem you faced**
If table is queried while a writing job is running - in which partition
field is updated - there's a brief moment in which the table returns duplicates.
It seems to me that this happens in the moment where new version of the
record was created in the new partition, and the old version was still not
removed from the old partition..
When the job finishes, the table does not return duplicates.
Is there a way to solve it in this version of Hudi (v0.14.1) - or was it
already solved in newer versions?
**To Reproduce**
Steps to reproduce the behavior:
1. Have a big table partitioned by a specific field
2. Run a job that will update the partitioned field
3. Query the table (in my case using Athena) - you may need to query several
times until it gets to the moment in which it returns duplicates
Relevant options used:
'hoodie.datasource.write.table.type': 'MERGE_ON_READ',
'hoodie.datasource.write.operation': 'upsert',
'hoodie.index.type': 'RECORD_INDEX',
'hoodie.record.index.update.partition.path' = 'true',
'hoodie.compact.inline.max.delta.commits':'1'
**Expected behavior**
To not get duplicates at any time
**Environment Description**
* Hudi version : v0.14.1
* Spark version : 3.5.1
* Hive version : 3.1.3
* Storage (HDFS/S3/GCS..) : S3
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]