[ 
https://issues.apache.org/jira/browse/NIFI-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069850#comment-15069850
 ] 

Bryan Bende commented on NIFI-1316:
-----------------------------------

This looks straight forward. In testDuplicateNoCache, should there be a second 
call to run() before you put caching back on? 
I may not understand the behavior correctly, but it seems like the first call 
to run will always produce a non-duplicate, and then you need to run again to 
prove it didn't cache anything and still produces a non-duplicate right?

Also, what do you think about calling the property "Cache Identifier" with a 
default of "true"? Instead of "Do Not Cache Identifier" with a default of 
"false". Just wondering if it would be clearer for the default behavior, but 
I'm probably just being picky.  

> Allow DetectDuplicate to only detect and not cache
> --------------------------------------------------
>
>                 Key: NIFI-1316
>                 URL: https://issues.apache.org/jira/browse/NIFI-1316
>             Project: Apache NiFi
>          Issue Type: Improvement
>    Affects Versions: 0.4.1
>            Reporter: Joseph Percivall
>            Priority: Minor
>         Attachments: 
> 0001-NIFI-1316-adding-option-to-DetectDuplicate-to-not-ca.patch, 
> WebCrawler.xml
>
>
> Working on a Webcrawler template/documentation I find myself wanting to have 
> a pair of detect duplicate processors. One of which does the typical check, 
> cache and remove if duplicate. The other I want to only check and remove if 
> Dup (don't add them to the cache in that processor).
> The use-case being I want to add URLs to the cache after being successfully 
> reached by the InvokeHttp processor. I also would like to check for urls that 
> were successfully reached before even sending them to the InvokeHttp 
> processor but I don't want to add to the cache before InvokeHttp because they 
> might not successfully hit the URL.
> I attached the template to the ticket. You can see how the DetectDuplicate 
> going into InvokeHttp should only check for duplicates and not cache them 
> (because the URL hasn't been successfully hit yet).
> Ideally this improvement would only require a configuration option added to 
> the processor which gives the option whether or not to cache. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to