[
https://issues.apache.org/jira/browse/NIFI-11945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753453#comment-17753453
]
Peter Kimberley commented on NIFI-11945:
----------------------------------------
In the process of reviewing this processor, I identified the following
additional problems, which are resolved in the referenced PR:
h2. Other issues resolved
# Expression language attributes `field.name`, `field.value` and `field.type`
are referenced in the documentation but not implemented. This can be confusing
for users of this processor. These attributes are removed in favour of a
simpler `RecordPath` syntax in dynamic properties
# Typos and confusing documentation (e.g. saying duplication only works on a
per-file basis in one area, while contradicting this in another)
# Reliance on map cache values to be put separately. This is non-atomic, so is
not safe when run using multiple workers. Now using the
`DistributedMapCacheClient::putIfAbsent()` method to achieve atomicity
# NPE when attempting to reference a non-existent record field or one with a
value of `null`. Added handling to treat this as an empty string.
# Hash set filter code path was never reachable due to incorrect equality check
h2. Other minor changes
# Removed redundant classes and constants
# Improved test coverage
# Extracted repeated strings as constant members
> DeduplicateRecord does not add keys to distributed map cache
> ------------------------------------------------------------
>
> Key: NIFI-11945
> URL: https://issues.apache.org/jira/browse/NIFI-11945
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 1.23.0
> Environment: Docker
> Reporter: Peter Kimberley
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The `DeduplicateRecord` processor supports the use of a distributed map cache
> (DMC).
> After generating the record key, it checks for the existence of that key in
> the cache. It then calls `DistributedMapCacheClientWrapper::put()`, which in
> this case, is a noop. Therefore, a cache entry is never written and records
> are always routed to the `non-duplicate` relationship.
> The correct behaviour would be for
> `DistributedMapCacheClientWrapper:contains()` to call
> `DistributedMapCacheClient::putIfAbsent()`, which would atomically check/set
> the key in the target cache.
> An additional problem is a NPE where a DMC is used and the
> `DeduplicateRecord` property `Record Hashing Algorithm` is set to `NONE`.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)