I believe you want to ship data to each node in your cluster before MR begins so the mappers can access files local to their machine. Hadoop tutorial on YDN has some good info on this.
http://developer.yahoo.com/hadoop/tutorial/module5.html#auxdata -Prashant Kommireddi On Fri, Nov 25, 2011 at 1:05 AM, Andy Doddington <a...@doddington.net>wrote: > I have a series of mappers that I would like to be passed data using the > distributed cache mechanism. At the > moment, I am using HDFS to pass the data, but this seems wasteful to me, > since they are all reading the same data. > > Is there a piece of example code that shows how data files can be placed > in the cache and accessed by mappers? > > Thanks, > > Andy Doddington > >