[ https://issues.apache.org/jira/browse/NIFI-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583302#comment-15583302 ]
Bryan Bende commented on NIFI-2853: ----------------------------------- [~joewitt] Sticking with running ListHDFS only on primary node makes sense to me. Do you think we should consider an annotation for @PrimaryNodeOnly so that the framework can prevent certain processors from being schedule on all nodes? or maybe that is overkill and we should just ensure proper documentation? > Improve ListHDFS state tracking > ------------------------------- > > Key: NIFI-2853 > URL: https://issues.apache.org/jira/browse/NIFI-2853 > Project: Apache NiFi > Issue Type: Improvement > Affects Versions: 1.0.0 > Reporter: Bryan Bende > Priority: Minor > > Currently ListHDFS tracks two properties in state management, > "listing.timestamp" and "emitted.timestamp". In the 1.0.0 release, the > directory property now supports expression language which means the directory > being listed could dynamically change on any execution of the processor. > The processor should be changed to store state specific to the directory that > was listed, for example "listing.timestamp.dir1" and "emitted.timestamp.dir1". > This would also help in a clustered scenario... currently ListHDFS has to be > run on primary node only, otherwise each node will be overwriting each others > state and producing unexpected results. With the above improvement, if the > directory evaluated to a unique path for each node, it would store the state > of each of those path separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)