[
https://issues.apache.org/jira/browse/NIFI-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069850#comment-15069850
]
Bryan Bende commented on NIFI-1316:
-----------------------------------
This looks straight forward. In testDuplicateNoCache, should there be a second
call to run() before you put caching back on?
I may not understand the behavior correctly, but it seems like the first call
to run will always produce a non-duplicate, and then you need to run again to
prove it didn't cache anything and still produces a non-duplicate right?
Also, what do you think about calling the property "Cache Identifier" with a
default of "true"? Instead of "Do Not Cache Identifier" with a default of
"false". Just wondering if it would be clearer for the default behavior, but
I'm probably just being picky.
> Allow DetectDuplicate to only detect and not cache
> --------------------------------------------------
>
> Key: NIFI-1316
> URL: https://issues.apache.org/jira/browse/NIFI-1316
> Project: Apache NiFi
> Issue Type: Improvement
> Affects Versions: 0.4.1
> Reporter: Joseph Percivall
> Priority: Minor
> Attachments:
> 0001-NIFI-1316-adding-option-to-DetectDuplicate-to-not-ca.patch,
> WebCrawler.xml
>
>
> Working on a Webcrawler template/documentation I find myself wanting to have
> a pair of detect duplicate processors. One of which does the typical check,
> cache and remove if duplicate. The other I want to only check and remove if
> Dup (don't add them to the cache in that processor).
> The use-case being I want to add URLs to the cache after being successfully
> reached by the InvokeHttp processor. I also would like to check for urls that
> were successfully reached before even sending them to the InvokeHttp
> processor but I don't want to add to the cache before InvokeHttp because they
> might not successfully hit the URL.
> I attached the template to the ticket. You can see how the DetectDuplicate
> going into InvokeHttp should only check for duplicates and not cache them
> (because the URL hasn't been successfully hit yet).
> Ideally this improvement would only require a configuration option added to
> the processor which gives the option whether or not to cache.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)