[ 
https://issues.apache.org/jira/browse/NIFI-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837558#comment-16837558
 ] 

Bryan Bende commented on NIFI-6286:
-----------------------------------

Making ListHDFS support incoming flow files is a little bit challenging since 
we need to track state, and with incoming flow files and dynamic values then 
you potentially have an infinite number of directories and state to maintain. 
There have been long discussions about this in the past, one I remember was 
around ListSFTP 
([https://mail-archives.apache.org/mod_mbox/nifi-dev/201803.mbox/%[email protected]%3e).]
 One suggestion was to have two variations of the processors, such as ListHDFS 
(works as today) and ListHDFSOnce (some better name) that just lists whatever 
is provided but doesn't track any state, the second processor would support 
incoming flow files.

 

> Make listHDFS work as INPUT_ALLOWED processor
> ---------------------------------------------
>
>                 Key: NIFI-6286
>                 URL: https://issues.apache.org/jira/browse/NIFI-6286
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 1.9.2
>            Reporter: Jasper Knulst
>            Priority: Major
>              Labels: features
>
> Currently the listHDFS processor has a prop 'Directory' (to start the listing 
> from, recursively or not) which only allows 1 static value.
> There are many use cases where you would want to crawl many roots in 
> sequence. There are 2 ways to do it.
>  # Allow the 'Directory' prop to have multiple comma separated values
>  # Refactor listHDFS as an INPUT_ALLOWED processor and make the 'Directory' 
> prop take EL to get directory roots from upstream
> Option 1. has serious restrictions since it dictates that other config (like 
> recursive, filter type and regex) would still be static and may get very 
> complex, non-intuitive and require frequent re-configuration.
> Option 2. is the way to go.
> Some things to consider:
> -The way listHDFS behaves now should be preserved
> -It makes sense to dynamically set 'Directory', 'Recursiveness', 'Regex' and 
> 'Filter type' in tandem      to be able to detail the way each root directory 
> is crawled
> -Switching 'Directory' also requires that not just 1 state is stored but 
> states for each directory that ever passed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to