[ 
https://issues.apache.org/jira/browse/NIFI-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17278923#comment-17278923
 ] 

ASF subversion and git services commented on NIFI-8081:
-------------------------------------------------------

Commit b55998afc18e6765204bac5493f29c47c9f66f9a in nifi's branch 
refs/heads/main from Tamas Palfy
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=b55998a ]

NIFI-8081 Added new Listing Strategy to ListFTP and ListSFTP: Time Window

NIFI-8081 Added new Listing Strategy to ListFTP and ListSFTP: Adjusted Time 
Window. User can specify the time zone or time difference (compared to where 
NiFi runs) of the system hosting the files and based on the calculates the 
current time there. Lists files modified before this adjusted current time (and 
after the last listing).
NIFI-8081 'Time Adjustment' validated not to be set if listing strategy is not 
'Adjusted Time Window'. Extracted validator to a separate class. Added more 
tests. Minor refactor. Typo fix.
NIFI-8081 Improved validation.
NIFI-8081 'Time Adjustment' is not necessary - in fact it can cause problems. 
SFTP (and usually FTP - which has a more general bug at the moment) returns a 
timestamp that doesn't really need adjustment. (SFTP in particular returns the 
an 'epoch' time.) Everything remains the same - the new listing strategy relies 
on a sliding time window, but without the unnecessary option to adjust for the 
modification time.
NIFI-8081 Resolved conflicts after rebasing to main.
NIFI-8081 Renamed 'AbstractListProcessor.listByAdjustedSlidingTimeWindow' to 
'listByTimeWindow'. Post main rebase correction.
NIFI-8081 Updated user doc for the BY_TIME_WINDOW strategy to warn user on it's 
reliance of accurate time.

This closes #4721.

Signed-off-by: Peter Turcsanyi <[email protected]>


> List[S]FTP can miss files when multiple subdirectories are written while 
> listing
> --------------------------------------------------------------------------------
>
>                 Key: NIFI-8081
>                 URL: https://issues.apache.org/jira/browse/NIFI-8081
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Tamas Palfy
>            Assignee: Tamas Palfy
>            Priority: Major
>          Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> ListFTP and ListSFTP scans subdirectories one after the other and because of 
> this they can have the following issue when using 'Tracking Timestamps' as 
> 'Listing Strategy':
> # Processor starts and finishes listing directory1
> # Processor starts listing directory2
> # file1 arrives in directory1 with ts(timestamp)=1
> # file2 arrives in directory2 (or any other, not yet listed directory) with 
> ts=2
> # Processor finishes listing director2
> # Processor returns result which will contain file2(ts=2) but not file1(ts=1)
> # Processor stores ts=2 as the latest seen timestamp
> # file1 will be filtered out next time (and every subsequent listing) because 
> it's timestamp is less than the lates seen timestamp
> Fix: Leave 'Tracking Timestamps' behaviour as it is (just update 
> documentation) and create a new strategy. This strategy checks the current 
> time in each cycle and lists all files that have arrived before the current 
> time (but after the previous cycle). Compares file timestamps to the current 
> time so it needs to be adjusted with the timezone difference of NiFi and the 
> file hosting system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to