On Jul 18, 2008, at 4:53 PM, Steve Gao wrote:
Hi All,
I am using Hadoop Streaming. I am confused by streaming
options: -file and -CacheFile. Seems that they mean the same thing,
right?
The difference is that -file will 'ship' your file (local file) to
the cluster, while -cachefile assumes that it is already present on
HDFS at the given path.
Another misleading options are : -NumReduceTasks and -jobconf
mapred.reduce.tasks. Both are used to control (or give hit to) the
number of reducers.
Yes, they are both equivalent.
hth,
Arun