Scenario:
Hadoop version: 0.20.2
MR coding will be done in java.


Just starting out with my first Hadoop setup. I would like to know are there any best practice ways to load data into the dfs? I have (obviously) manually put data files into hdfs using the shell commands while playing with it at setup but going forward I will want to be retrieving large numbers of data feeds from remote, 3rd party locations and throwing them into hadoop for analysis later. What is the best way to automate this? Is it to gather the retrieved files into known locations to be mounted and then automate via script etc. to put the files into hdfs? Or is there some other practice? I've not been able to find specific use case yet... all docs cover the basic fs command without giving much details about more advanced setups.

thanks for any info

regards

Reply via email to