On Jul 18, 2008, at 4:53 PM, Steve Gao wrote:

Hi All,
I am using Hadoop Streaming. I am confused by streaming options: -file and -CacheFile. Seems that they mean the same thing, right?


The difference is that -file will 'ship' your file (local file) to the cluster, while -cachefile assumes that it is already present on HDFS at the given path.

Another misleading options are : -NumReduceTasks and -jobconf mapred.reduce.tasks. Both are used to control (or give hit to) the number of reducers.


Yes, they are both equivalent.

hth,
Arun

Reply via email to