[
https://issues.apache.org/jira/browse/HUDI-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Guo updated HUDI-5501:
----------------------------
Description: In Delastreamer, user can set --filter-dupes to dedup the
input records. Currently, Hudi does not report any metric on the duplicates.
It would be good to report such metrics and monitor how many records are
filtered in each commit.
> Report metrics on the number of duplicates after dedup
> ------------------------------------------------------
>
> Key: HUDI-5501
> URL: https://issues.apache.org/jira/browse/HUDI-5501
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Ethan Guo
> Priority: Major
>
> In Delastreamer, user can set --filter-dupes to dedup the input records.
> Currently, Hudi does not report any metric on the duplicates. It would be
> good to report such metrics and monitor how many records are filtered in each
> commit.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)