[ 
https://issues.apache.org/jira/browse/NIFI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro D'Armiento updated NIFI-6462:
----------------------------------------
    Priority: Minor  (was: Major)

> ListHDFS should be triggerable
> ------------------------------
>
>                 Key: NIFI-6462
>                 URL: https://issues.apache.org/jira/browse/NIFI-6462
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 1.9.2
>            Reporter: Alessandro D'Armiento
>            Priority: Minor
>
> h2. Current Situation
> ListHDFS is designed to be (only) the entry point of a data integration 
> pipeline, and therefore can only be triggered on a cron or time base. 
> h2. Improvement Proposal
> ListHDFS should be able to be used as part of your pipeline even if you do 
> not expect to have it as the entry point. To obtain it: 
> * It has to be triggerable
> * Trigger flowfile should be able to bring the listing directory as an 
> attribute
> * Some logic, such as the "skip the last file in the listing directory" 
> should be made optional
> * Since the processor will work on a 1:N semantic (1 input trigger flowfile, 
> N output flowfiles) it would be nice to support fragmentation attributes (for 
> example for subsequent merge operations)
>   * It would be also useful to support different fragmentation strategies, in 
> order to support multiple user cases. For example, it should be possible to 
> select:
>     *  A "one for all" fragmentation strategy which will create a single 
> fragmentation group. Therefore, all files will have the same 
> fragment.identifier, the same fragment.count, equal to the total number N of 
> listed files, and fragment.index ∈ [0, N).
>     *  A "per subdir" fragmentation strategy which will create different 
> fragmentation groups, one for each scanned subdirectory of the given path. 
> Therefore, for each subfolder, flowfiles will have a specific 
> fragment.identifier, fragment.count will be, for each flowfile, equal to the 
> number Ni of files in the i-th directory, and fragment.index ∈ [0, Ni).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to