What is the best way to bring such a huge file from a FTP server into Hadoop to persist in HDFS? Since a single jvm process might run out of memory, I was wondering if I can use Spark or Flume to do this. Any help on this matter is appreciated. I prefer a application/process running inside Hadoop which is doing this transfer Thanks.
- Setting up Spark/flume/? to Ingest 10TB from FTP Varadhan, Jawahar
- Re: Setting up Spark/flume/? to Ingest 10TB from FT... Marcelo Vanzin