Tamas Palfy created NIFI-8081:
---------------------------------

             Summary: List[S]FTP can miss files when multiple subdirectories 
are written while listing
                 Key: NIFI-8081
                 URL: https://issues.apache.org/jira/browse/NIFI-8081
             Project: Apache NiFi
          Issue Type: Improvement
            Reporter: Tamas Palfy


ListFTP and ListSFTP scans subdirectories one after the other and because of 
this they can have the following issue when using 'Tracking Timestamps' as 
'Listing Strategy':
# Processor starts and finishes listing directory1
# Processor starts listing directory2
# file1 arrives in directory1 with ts(timestamp)=1
# file2 arrives in directory2 (or any other, not yet listed directory) with ts=2
# Processor finishes listing director2
# Processor returns result which will contain file2(ts=2) but not file1(ts=1)
# Processor stores ts=2 as the latest seen timestamp
# file1 will be filtered out next time (and every subsequent listing) because 
it's timestamp is less than the lates seen timestamp

Fix: Leave 'Tracking Timestamps' behaviour as it is (just update documentation) 
and create a new strategy. This strategy checks the current time in each cycle 
and lists all files that have arrived before the current time (but after the 
previous cycle). Compares file timestamps to the current time so it needs to be 
adjusted with the timezone difference of NiFi and the file hosting system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to