[
https://issues.apache.org/jira/browse/NIFI-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140404#comment-15140404
]
Aldrin Piri commented on NIFI-1484:
-----------------------------------
Thanks for the stacktraces. Hadn't tried it across restarts, but will play
around with that a bit.
> ListFile holds unbounded list of files with matching time stamps
> ----------------------------------------------------------------
>
> Key: NIFI-1484
> URL: https://issues.apache.org/jira/browse/NIFI-1484
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core UI, Extensions
> Affects Versions: 0.4.0, 0.5.0
> Reporter: Joseph Witt
> Assignee: Aldrin Piri
> Fix For: 0.5.0
>
>
> ListFile appears to hold an unbounded set of filenames that match the last
> timestamp. While this is understandable to handle the edge case of new data
> arriving at the same time it presents two problems. First we hold all of
> this information in state management which could put considerable pressure on
> both the local and remote stores but we also have it in memory before we
> persist it.
> Also, the entire state listing appears to show up in the UI without
> pagination or any limit on number of entries. This seems like a problem for
> the client-side as well. The server side should probably restrict this.
> Finally, it seems like the need for saving filenames seen at a given
> timestamp is only necessary if we're assuming the listing we do is 'as-of'
> RIGHT NOW. What is instead we did the listing based on a last modified time
> of 'RIGHTNOW'-1 millisecond or something like that? Then we should not have
> to worry at all about keeping a listing of names for the timestamp.
> The reason I think this is important is that it is not at all uncommon for a
> directory with large quantities of files to have data at the same time due to
> a copy operation not preserving original file attributes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)