[ 
https://issues.apache.org/jira/browse/NIFI-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377031#comment-16377031
 ] 

Ed Berezitsky edited comment on NIFI-4906 at 2/26/18 3:28 PM:
--------------------------------------------------------------

ListHDFS is stateful and doesn't support directory-level info in result set. It 
also doesn't support incoming connections. Sometimes you don't need to "get" a 
file (list + fetch), you just need to know that the file(s)/dir(s) exists or 
not and all the information related to it (size, permissions and other listed 
in description). Since HDF not always running on an edge of HDP cluster, you 
also cannot use execute script to run hdfs dfs commands. So this effort if to 
create kinda HDFS client for read-only operations (-count, -du, -ls, -test and 
some others).

I hope it makes sense.


was (Author: bdesert):
ListHDFS is stateful and doesn't support directory-level info in result set. It 
also doesn't support incoming connections. Sometimes you don't need to "get" a 
file, you just need to know that the file(s)/dir(s) exists or not and all the 
information related to it (size, permissions and other listed in description). 
Since HDF not always running on an edge of HDP cluster, you also cannot use 
execute script to run hdfs dfs commands. So this effort if to create kinda HDFS 
client for read-only operations (-count, -du, -ls, -test and some others).

I hope it makes sense.

> Add GetHdfsFileInfo Processor
> -----------------------------
>
>                 Key: NIFI-4906
>                 URL: https://issues.apache.org/jira/browse/NIFI-4906
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Ed Berezitsky
>            Assignee: Ed Berezitsky
>            Priority: Major
>
> Add *GetHdfsFileInfo* Processor to be able to get stats from a file system.
> This processor should support recursive scan, getting information of 
> directories and files.
> _File-level info required_: name, path, length, modified timestamp, last 
> access timestamp, owner, group, permissions.
> _Directory-level info required_: name, path, sum of lengths of files under a 
> dir, count of files under a dir, modified timestamp, last access timestamp, 
> owner, group, permissions.
>  
> The result returned:
>  * in single flow file (in content - a json line per file/dir info);
>  * flow file per each file/dir info (in content as json obj or in set of 
> attributes by the choice).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to