Arun C Murthy wrote:
On Feb 23, 2009, at 2:01 AM, Bing TANG wrote:
Hi, everyone,
Could somdone tell me the principle of "-file" when using Hadoop
Streaming. I want to ship a big file to Slaves, so how it works?
Hadoop uses "SCP" to copy? How does Hadoop deal with -file option?
No, -file just copies the file from the local filesystem to HDFS, and
the DistributedCache copies it to the local filesystem of the node on
which the map/reduce task runs.
-file option does not use DistributedCache yet. HADOOP-2622 is still
open for the same. -file option ships the files along with the streaming
jar. (it unpacks the jar and copy the files and pack the jar again). You
can use -files, -libjars and -archives to copy the files to distributed
cache.
-Amareshwari
Arun