Why do you need to use Spark or Flume for this? You can just use curl and hdfs:
curl ftp://blah | hdfs dfs -put - /blah On Fri, Aug 14, 2015 at 1:15 PM, Varadhan, Jawahar < varad...@yahoo.com.invalid> wrote: > What is the best way to bring such a huge file from a FTP server into > Hadoop to persist in HDFS? Since a single jvm process might run out of > memory, I was wondering if I can use Spark or Flume to do this. Any help on > this matter is appreciated. > > I prefer a application/process running inside Hadoop which is doing this > transfer > > Thanks. > -- Marcelo