Joseph Witt created NIFI-1484:
---------------------------------

             Summary: ListFile holds unbounded list of files with matching time 
stamps
                 Key: NIFI-1484
                 URL: https://issues.apache.org/jira/browse/NIFI-1484
             Project: Apache NiFi
          Issue Type: Bug
          Components: Core UI, Extensions
    Affects Versions: 0.4.0, 0.5.0
            Reporter: Joseph Witt


ListFile appears to hold an unbounded set of filenames that match the last 
timestamp.  While this is understandable to handle the edge case of new data 
arriving at the same time it presents two problems.  First we hold all of this 
information in state management which could put considerable pressure on both 
the local and remote stores but we also have it in memory before we persist it.

Also, the entire state listing appears to show up in the UI without pagination 
or any limit on number of entries.  This seems like a problem for the 
client-side as well.  The server side should probably restrict this.

Finally, it seems like the need for saving filenames seen at a given timestamp 
is only necessary if we're assuming the listing we do is 'as-of' RIGHT NOW.  
What is instead we did the listing based on a last modified time of 
'RIGHTNOW'-1 millisecond or something like that?  Then we should not have to 
worry at all about keeping a listing of names for the timestamp.

The reason I think this is important is that it is not at all uncommon for a 
directory with large quantities of files to have data at the same time due to a 
copy operation not preserving original file attributes.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to