A first time contributor named Adam Fisher and I submitted PRs for a "deduplicate record" processor roughly at the same time. His was focused mainly around removing duplicates from within a record set using the record set itself as the source of truth, whereas mine relied on a DistributedMapCache and record path operations to focus on data lake-wide deduplication.
Here's his PR for reference: https://github.com/apache/nifi/pull/3317 The Git history is fairly broken at this point (I tried a rebase and found some really bad merge commits), but I was able to squash it and cherry-pick it onto main. I think they're two separate use cases and should probably be two separate processors in order to keep things simple. Before I put much effort into pushing both PRs along, I'd like to know if anyone else has any preferences/ideas on this. Thanks, Mike
