[
https://issues.apache.org/jira/browse/NIFI-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bryan Bende updated NIFI-2853:
------------------------------
Description:
Currently ListHDFS tracks two properties in state management,
"listing.timestamp" and "emitted.timestamp". In the 1.0.0 release, the
directory property now supports expression language which means the directory
being listed could dynamically change on any execution of the processor.
The processor should be changed to store state specific to the directory that
was listed, for example "listing.timestamp.dir1" and "emitted.timestamp.dir1".
This would also help in a clustered scenario... currently ListHDFS has to be
run on primary node only, otherwise each node will be overwriting each others
state and producing unexpected results. With the above improvement, if the
directory evaluated to a unique path for each node, it would store the state of
each of those path separately.
was:
Currently ListHDFS tracks two properties in state management,
"listing.timestamp" and "emitted.timestamp". In the 1.0.0 release, the
directory property now supports expression language which means the directory
being listed could dynamically change on any execution of the processor. The
processor should store state specific to the directory that was listed, for
example "listing.timestamp.dir1" and "emitted.timestamp.dir1".
This would also help in a clustered scenario... currently ListHDFS has to be
run on primary node only, otherwise each node will be overwriting each others
state and producing unexpected results. With the above improvement, if the
directory evaluated to a unique path for each node, it would store the state of
each of those path separately.
> Improve ListHDFS state tracking
> -------------------------------
>
> Key: NIFI-2853
> URL: https://issues.apache.org/jira/browse/NIFI-2853
> Project: Apache NiFi
> Issue Type: Improvement
> Affects Versions: 1.0.0
> Reporter: Bryan Bende
> Priority: Minor
>
> Currently ListHDFS tracks two properties in state management,
> "listing.timestamp" and "emitted.timestamp". In the 1.0.0 release, the
> directory property now supports expression language which means the directory
> being listed could dynamically change on any execution of the processor.
> The processor should be changed to store state specific to the directory that
> was listed, for example "listing.timestamp.dir1" and "emitted.timestamp.dir1".
> This would also help in a clustered scenario... currently ListHDFS has to be
> run on primary node only, otherwise each node will be overwriting each others
> state and producing unexpected results. With the above improvement, if the
> directory evaluated to a unique path for each node, it would store the state
> of each of those path separately.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)