[ 
https://issues.apache.org/jira/browse/NIFI-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16411605#comment-16411605
 ] 

Greg Senia commented on NIFI-5004:
----------------------------------

[~pvillard] so should Nifi only be used if they are modifying the flow? If its 
not modifying the flow than should Nifi not be involved with the flow? In my 
discussions with vendors who support Nifi they want it to be the end all be all 
of data flows into a data lake. So I wonder when I'm told I should be replacing 
Sqoop jobs with similar functions as nifi to land the data in hadoop? Why would 
I want Nifi to be a middle man when the data ends up inside of a Data Lake/Hub 
Hadoop etc? When MR or Spark can drive the data directly into HDFS bypassing 
Nifi nodes? Like I said this is just a thought on ways to make data flows more 
effcient and still use nifi to track flows. 

> Ability to Execute File (FTP/CIFS/SFTP) Copy jobs on Mapreduce From Nifi
> ------------------------------------------------------------------------
>
>                 Key: NIFI-5004
>                 URL: https://issues.apache.org/jira/browse/NIFI-5004
>             Project: Apache NiFi
>          Issue Type: Wish
>            Reporter: Greg Senia
>            Priority: Critical
>
> Would like to see Nifi run programs on MapReduce exampesl of these like 
> FTP2HDFS [https://github.com/gss2002/ftp2hdfs] and CIFS2HDFS 
> [https://github.com/gss2002/cifs2hdfs] as a MapReduce application where the 
> final resting place is HDFS without any type of data transform on the way in. 
> This would reduce overhead on the Nifi node and move the incoming data 
> directly to the datanode via shortcircuit/read rites. As I currently have 
> these two applications running as MR jobs now and doing this being able to do 
> this from within Nifi pointing at HDFS/YARN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to