[ 
https://issues.apache.org/jira/browse/NIFI-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064035#comment-17064035
 ] 

Jens M Kofoed edited comment on NIFI-7263 at 3/21/20, 7:35 PM:
---------------------------------------------------------------

[~waibani] Yes, you right.

The problem with the GetFile, without deleting the original is it load all the 
content into nifi. I would like the List process just to list all files. And 
then I create my own logic to detect duplicate files (files which has been 
listed before). New files will be round-robin to each nodes in the cluster for 
the fetch process to get the file. I am routing duplicated files, to a dummy 
block which in normal state is disabled and with FlowFile expiration. So these 
repeated file automatically will expire. But if we need duplicated files to be 
fetch again, it is easy to enable this dummy block. 


was (Author: jmkofoed):
Yes, you right.

The problem with the GetFile, without deleting the original is it load all the 
content into nifi. I would like the List process just to list all files. And 
then I create my own logic to detect duplicate files (files which has been 
listed before). New files will be round-robin to each nodes in the cluster for 
the fetch process to get the file. I am routing duplicated files, to a dummy 
block which in normal state is disabled and with FlowFile expiration. So these 
repeated file automatically will expire. But if we need duplicated files to be 
fetch again, it is easy to enable this dummy block. 

> Add a No tracking Strategy to ListFile/ListFTP
> ----------------------------------------------
>
>                 Key: NIFI-7263
>                 URL: https://issues.apache.org/jira/browse/NIFI-7263
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Jens M Kofoed
>            Priority: Major
>              Labels: ListFile, listftp
>
> The Listfile/ListFTP has 2 Listing Strategies: Tracking Timestamps and 
> Tracking Entities.
> It would be very very nice if the List process also could have a No Tracking 
> (fix it your self) strategy
> If running NIFI in a cluster the List/Fetch is the perfect solution instead 
> of using a GetFile. But we have had many caces where files in the pickup 
> folder has old timestamps, so here we have to use Tracking Entities.
> The issue is in cases where you are not allowed to delete files but you have 
> to make a change to the file filter. The tracking entities start all over, 
> and list all files again.
> In other situations we need to resent all data, and would like to clear the 
> state of the Tracking Entities. But you can't.
> So I have to make a small flow for detecting duplicates. And in some cases 
> just ignore duplicates and in other caces open up for sending duplicates. But 
> it is a pain in the ... to use the Tracking Entities.
> So a NO STRATEGY would be very very nice



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to