[ 
https://issues.apache.org/jira/browse/NIFI-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tamas Palfy reassigned NIFI-8081:
---------------------------------

    Assignee: Tamas Palfy

> List[S]FTP can miss files when multiple subdirectories are written while 
> listing
> --------------------------------------------------------------------------------
>
>                 Key: NIFI-8081
>                 URL: https://issues.apache.org/jira/browse/NIFI-8081
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Tamas Palfy
>            Assignee: Tamas Palfy
>            Priority: Major
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> ListFTP and ListSFTP scans subdirectories one after the other and because of 
> this they can have the following issue when using 'Tracking Timestamps' as 
> 'Listing Strategy':
> # Processor starts and finishes listing directory1
> # Processor starts listing directory2
> # file1 arrives in directory1 with ts(timestamp)=1
> # file2 arrives in directory2 (or any other, not yet listed directory) with 
> ts=2
> # Processor finishes listing director2
> # Processor returns result which will contain file2(ts=2) but not file1(ts=1)
> # Processor stores ts=2 as the latest seen timestamp
> # file1 will be filtered out next time (and every subsequent listing) because 
> it's timestamp is less than the lates seen timestamp
> Fix: Leave 'Tracking Timestamps' behaviour as it is (just update 
> documentation) and create a new strategy. This strategy checks the current 
> time in each cycle and lists all files that have arrived before the current 
> time (but after the previous cycle). Compares file timestamps to the current 
> time so it needs to be adjusted with the timezone difference of NiFi and the 
> file hosting system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to