[
https://issues.apache.org/jira/browse/NIFI-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398906#comment-16398906
]
Bryan Bende commented on NIFI-2853:
-----------------------------------
[~sivaprasanna] I'm not sure this scenario is accurate...
Any time the directory or filter is changed in the processor, the state
tracking resets:
[https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java#L204-L206]
So in your example when you changed from "/tmp/sub-dir" to "/tmp" it would
reset and pick up everything again.
I don't think there is anything to change about the state tracking. We can
either close the JIRA, or apply the default scheduling strategy that Pierre
suggested.
> Improve ListHDFS state tracking
> -------------------------------
>
> Key: NIFI-2853
> URL: https://issues.apache.org/jira/browse/NIFI-2853
> Project: Apache NiFi
> Issue Type: Improvement
> Affects Versions: 1.0.0
> Reporter: Bryan Bende
> Priority: Minor
>
> Currently ListHDFS tracks two properties in state management,
> "listing.timestamp" and "emitted.timestamp". In the 1.0.0 release, the
> directory property now supports expression language which means the directory
> being listed could dynamically change on any execution of the processor.
> The processor should be changed to store state specific to the directory that
> was listed, for example "listing.timestamp.dir1" and "emitted.timestamp.dir1".
> This would also help in a clustered scenario... currently ListHDFS has to be
> run on primary node only, otherwise each node will be overwriting each others
> state and producing unexpected results. With the above improvement, if the
> directory evaluated to a unique path for each node, it would store the state
> of each of those path separately.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)