Less work by skipping setting up the input splits, distributing the job jar files, scheduling the map tasks on the task trackers, collecting the task status results, then starting all the reduce tasks, collecting all the results, sorting them, feeding them to the reduce tasks, then writing them to hdfs. No jvm forking etc. The overhead of co-ordination is not small, but pays off when the jobs are large and the number of machines are large.
local works great for jobs that really only need the resources of a single machine and only require a single reduce task. On Fri, May 1, 2009 at 6:09 PM, Asim <linka...@gmail.com> wrote: > Thanks Aaron. That worked! However, when i run everything as local, I > see everything executing much faster on local as compared to a single > node. Is there any reason for the same? > > -Asim > > On Thu, Apr 30, 2009 at 9:23 AM, Aaron Kimball <aa...@cloudera.com> wrote: > > First thing I would do is to run the job in the local jobrunner (as a > single > > process on your local machine without involving the cluster): > > > > JobConf conf = ..... > > // set other params, mapper, etc. here > > conf.set("mapred.job.tracker", "local"); // use localjobrunner > > conf.set("fs.default.name", "file:///"); // read from local hard disk > > instead of hdfs > > > > JobClient.runJob(conf); > > > > > > This will actually print stdout, stderr, etc. to your local terminal. Try > > this on a single input file. This will let you confirm that it does, in > > fact, write to stdout. > > > > - Aaron > > > > On Thu, Apr 30, 2009 at 9:00 AM, Asim <linka...@gmail.com> wrote: > > > >> Hi, > >> > >> I am not able to see any job output in userlogs/<task_id>/stdout. It > >> remains empty even though I have many println statements. Are there > >> any steps to debug this problem? > >> > >> Regards, > >> Asim > >> > > > -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422