[ 
https://issues.apache.org/jira/browse/NIFI-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro D'Armiento updated NIFI-6465:
----------------------------------------
    Description: 
h2. Current Situation

>From [official 
>documentation|https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.9.2/org.apache.nifi.processors.hadoop.ListHDFS/index.html]

* Each time a listing is performed, the files with the latest timestamp will be 
excluded and picked up during the next execution of the processor. This is done 
to ensure that we do not miss any files, or produce duplicates, in the cases 
where files with the same timestamp are written immediately before and after a 
single execution of the processor.

h2. Improvement Proposal

* If we are calling the ListHDFS only after a certain operation which populates 
an HDFS directory has finished, it is pointless to skip the last file, and 
avoiding this behavior is tricky.
* A mandatory property "skip last" should be implemented in order to be able to 
actively decide whether or not this behavior is necessary, based on the use 
case.
* This is also particularly useful in combination with 
[NIFI-6462]|https://issues.apache.org/jira/browse/NIFI-6462]


  was:
h2. Current Situation

>From [official 
>documentation|https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.9.2/org.apache.nifi.processors.hadoop.ListHDFS/index.html]

* Each time a listing is performed, the files with the latest timestamp will be 
excluded and picked up during the next execution of the processor. This is done 
to ensure that we do not miss any files, or produce duplicates, in the cases 
where files with the same timestamp are written immediately before and after a 
single execution of the processor.

h2. Improvement Proposal

* If we are calling the ListHDFS only after a certain operation which populates 
an HDFS directory has finished, it is pointless to skip the last file, and 
avoiding this behavior is tricky.
* A mandatory property "skip last" should be implemented in order to be able to 
actively decide whether or not this behavior is necessary, based on the use 
case.



> ListHDFS: skip last should be optional
> --------------------------------------
>
>                 Key: NIFI-6465
>                 URL: https://issues.apache.org/jira/browse/NIFI-6465
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 1.9.2
>            Reporter: Alessandro D'Armiento
>            Priority: Minor
>
> h2. Current Situation
> From [official 
> documentation|https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.9.2/org.apache.nifi.processors.hadoop.ListHDFS/index.html]
> * Each time a listing is performed, the files with the latest timestamp will 
> be excluded and picked up during the next execution of the processor. This is 
> done to ensure that we do not miss any files, or produce duplicates, in the 
> cases where files with the same timestamp are written immediately before and 
> after a single execution of the processor.
> h2. Improvement Proposal
> * If we are calling the ListHDFS only after a certain operation which 
> populates an HDFS directory has finished, it is pointless to skip the last 
> file, and avoiding this behavior is tricky.
> * A mandatory property "skip last" should be implemented in order to be able 
> to actively decide whether or not this behavior is necessary, based on the 
> use case.
> * This is also particularly useful in combination with 
> [NIFI-6462]|https://issues.apache.org/jira/browse/NIFI-6462]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to