[
https://issues.apache.org/jira/browse/NIFI-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16411021#comment-16411021
]
Pierre Villard commented on NIFI-5004:
--------------------------------------
Can't you use the existing processors to perform this operation? For FTP2HDFS
for instance, what you can do is:
ListFTP (or ListSFTP) -> RemoteProcessGroup
InputPort -> FetchFTP (or FetchSFTP) -> PutHDFS
This way you will get the data in a distributed manner from your FTP server and
will push the data into HDFS.
If you want to use MR jobs, why do you need NiFi? You can already trigger the
execution of the jobs directly on your Hadoop cluster. No?
> Ability to Execute File (FTP/CIFS/SFTP) Copy jobs on Mapreduce From Nifi
> ------------------------------------------------------------------------
>
> Key: NIFI-5004
> URL: https://issues.apache.org/jira/browse/NIFI-5004
> Project: Apache NiFi
> Issue Type: Wish
> Reporter: Greg Senia
> Priority: Critical
>
> Would like to see Nifi run programs on MapReduce exampesl of these like
> FTP2HDFS [https://github.com/gss2002/ftp2hdfs] and CIFS2HDFS
> [https://github.com/gss2002/cifs2hdfs] as a MapReduce application where the
> final resting place is HDFS without any type of data transform on the way in.
> This would reduce overhead on the Nifi node and move the incoming data
> directly to the datanode via shortcircuit/read rites. As I currently have
> these two applications running as MR jobs now and doing this being able to do
> this from within Nifi pointing at HDFS/YARN.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)