[
https://issues.apache.org/jira/browse/HUDI-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
vinoyang closed HUDI-796.
-------------------------
Fix Version/s: 0.6.1
Resolution: Done
Done via master branch: 73e5b4c7bbd1306a6a2686f6e8d85a2c871ac7ff
> Rewrite DedupeSparkJob.scala without considering the _hoodie_commit_time
> ------------------------------------------------------------------------
>
> Key: HUDI-796
> URL: https://issues.apache.org/jira/browse/HUDI-796
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Pratyaksh Sharma
> Assignee: Pratyaksh Sharma
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.6.1
>
>
> _`_hoodie_commit_time` can only be used for deduping a partition path if
> duplicates happened due to INSERT operation. In case of updates, bloom filter
> tags both the files where a record is present for update, and all such files
> will have the same `___hoodie_commit_time__` for a duplicate record
> henceforth._
> _Hence it makes sense to rewrite this class without considering the metadata
> field._
--
This message was sent by Atlassian Jira
(v8.3.4#803005)