[ 
https://issues.apache.org/jira/browse/NIFI-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141007#comment-15141007
 ] 

ASF subversion and git services commented on NIFI-1484:
-------------------------------------------------------

Commit 1a512cd1e67e6d0231e9dcde9d32472fad4c5bd2 in nifi's branch 
refs/heads/master from [~aldrin]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=1a512cd ]

NIFI-1484 Making use of timestamps at various points of execution to provide 
listing of all but the latest files which are held until a subsequent execution.

Correcting nifi-amqp-nar bundle's pom description.

This closes #212.


> ListFile holds unbounded list of files with matching time stamps
> ----------------------------------------------------------------
>
>                 Key: NIFI-1484
>                 URL: https://issues.apache.org/jira/browse/NIFI-1484
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core UI, Extensions
>    Affects Versions: 0.4.0, 0.5.0
>            Reporter: Joseph Witt
>            Assignee: Aldrin Piri
>             Fix For: 0.5.0
>
>         Attachments: 0001-NIFI-1484-fixing-checkstyle-issue.patch
>
>
> ListFile appears to hold an unbounded set of filenames that match the last 
> timestamp.  While this is understandable to handle the edge case of new data 
> arriving at the same time it presents two problems.  First we hold all of 
> this information in state management which could put considerable pressure on 
> both the local and remote stores but we also have it in memory before we 
> persist it.
> Also, the entire state listing appears to show up in the UI without 
> pagination or any limit on number of entries.  This seems like a problem for 
> the client-side as well.  The server side should probably restrict this.
> Finally, it seems like the need for saving filenames seen at a given 
> timestamp is only necessary if we're assuming the listing we do is 'as-of' 
> RIGHT NOW.  What is instead we did the listing based on a last modified time 
> of 'RIGHTNOW'-1 millisecond or something like that?  Then we should not have 
> to worry at all about keeping a listing of names for the timestamp.
> The reason I think this is important is that it is not at all uncommon for a 
> directory with large quantities of files to have data at the same time due to 
> a copy operation not preserving original file attributes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to