[
https://issues.apache.org/jira/browse/NIFI-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377031#comment-16377031
]
Ed Berezitsky edited comment on NIFI-4906 at 2/26/18 3:28 PM:
--------------------------------------------------------------
ListHDFS is stateful and doesn't support directory-level info in result set. It
also doesn't support incoming connections. Sometimes you don't need to "get" a
file (list + fetch), you just need to know that the file(s)/dir(s) exists or
not and all the information related to it (size, permissions and other listed
in description). Since HDF not always running on an edge of HDP cluster, you
also cannot use execute script to run hdfs dfs commands. So this effort if to
create kinda HDFS client for read-only operations (-count, -du, -ls, -test and
some others).
I hope it makes sense.
was (Author: bdesert):
ListHDFS is stateful and doesn't support directory-level info in result set. It
also doesn't support incoming connections. Sometimes you don't need to "get" a
file, you just need to know that the file(s)/dir(s) exists or not and all the
information related to it (size, permissions and other listed in description).
Since HDF not always running on an edge of HDP cluster, you also cannot use
execute script to run hdfs dfs commands. So this effort if to create kinda HDFS
client for read-only operations (-count, -du, -ls, -test and some others).
I hope it makes sense.
> Add GetHdfsFileInfo Processor
> -----------------------------
>
> Key: NIFI-4906
> URL: https://issues.apache.org/jira/browse/NIFI-4906
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Extensions
> Reporter: Ed Berezitsky
> Assignee: Ed Berezitsky
> Priority: Major
>
> Add *GetHdfsFileInfo* Processor to be able to get stats from a file system.
> This processor should support recursive scan, getting information of
> directories and files.
> _File-level info required_: name, path, length, modified timestamp, last
> access timestamp, owner, group, permissions.
> _Directory-level info required_: name, path, sum of lengths of files under a
> dir, count of files under a dir, modified timestamp, last access timestamp,
> owner, group, permissions.
>
> The result returned:
> * in single flow file (in content - a json line per file/dir info);
> * flow file per each file/dir info (in content as json obj or in set of
> attributes by the choice).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)