Bryan Bende commented on NIFI-2853:

[~joewitt] Sticking with running ListHDFS only on primary node makes sense to 

Do you think we should consider an annotation for @PrimaryNodeOnly so that the 
framework can prevent certain processors from being schedule on all nodes? or 
maybe that is overkill and we should just ensure proper documentation?

> Improve ListHDFS state tracking
> -------------------------------
>                 Key: NIFI-2853
>                 URL: https://issues.apache.org/jira/browse/NIFI-2853
>             Project: Apache NiFi
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Bryan Bende
>            Priority: Minor
> Currently ListHDFS tracks two properties in state management, 
> "listing.timestamp" and "emitted.timestamp". In the 1.0.0 release, the 
> directory property now supports expression language which means the directory 
> being listed could dynamically change on any execution of the processor. 
> The processor should be changed to store state specific to the directory that 
> was listed, for example "listing.timestamp.dir1" and "emitted.timestamp.dir1".
> This would also help in a clustered scenario... currently ListHDFS has to be 
> run on primary node only, otherwise each node will be overwriting each others 
> state and producing unexpected results. With the above improvement, if the 
> directory evaluated to a unique path for each node, it would store the state 
> of each of those path separately.

This message was sent by Atlassian JIRA

Reply via email to