Hi, @Ted, below code is internal code. Users are not expected to call DistributedCache.getLocalCache(), they cannot use it also. They do not know all the parameters. @Larry, DistributedCache is not changed to use new api in branch 0.20. The change is done in only from branch 0.21. See MAPREDUCE-898 ( https://issues.apache.org/jira/browse/MAPREDUCE-898). If you are using branch 0.20, you are encouraged to use deprecated JobConf itself. You can try the following change in your code: Change the line > > > DistributedCache.addCacheFile(new Path(args[0]).toUri(), conf); to DistributedCache.addCacheFile(new Path(args[0]).toUri(), job.getConfiguration());
Thanks Amareshwari On 4/16/10 2:27 AM, "Ted Yu" <[email protected]> wrote: Please take a look at the loop starting at line 158 in TaskRunner.java: p[i] = DistributedCache.getLocalCache(files[i], conf, new Path(baseDir), fileStatus, false, Long.parseLong( fileTimestamps[i]), new Path(workDir. getAbsolutePath()), false); } DistributedCache.setLocalFiles(conf, stringifyPathArray(p)); I think the confusing part is that DistributedCache.getLocalCacheFiles() is paired with DistributedCache.setLocalFiles() Cheers On Thu, Apr 15, 2010 at 1:16 PM, Larry Compton <[email protected]>wrote: > Ted, > > Thanks. I have looked at that example. The javadocs for DistributedCache > still refer to deprecated classes, like JobConf. I'm trying to use the > revised API. > > Larry > > On Thu, Apr 15, 2010 at 4:07 PM, Ted Yu <[email protected]> wrote: > > > Please see the sample within > > src\core\org\apache\hadoop\filecache\DistributedCache.java: > > > > * JobConf job = new JobConf(); > > * DistributedCache.addCacheFile(new > > URI("/myapp/lookup.dat#lookup.dat"), > > * job); > > > > > > On Thu, Apr 15, 2010 at 12:56 PM, Larry Compton > > <[email protected]>wrote: > > > > > I'm trying to use the distributed cache in a MapReduce job written to > the > > > new API (org.apache.hadoop.mapreduce.*). In my "Tool" class, a file > path > > is > > > added to the distributed cache as follows: > > > > > > public int run(String[] args) throws Exception { > > > Configuration conf = getConf(); > > > Job job = new Job(conf, "Job"); > > > ... > > > DistributedCache.addCacheFile(new Path(args[0]).toUri(), conf); > > > ... > > > return job.waitForCompletion(true) ? 0 : 1; > > > } > > > > > > The "setup()" method in my mapper tries to read the path as follows: > > > > > > protected void setup(Context context) throws IOException { > > > Path[] paths = DistributedCache.getLocalCacheFiles(context > > > .getConfiguration()); > > > } > > > > > > But "paths" is null. > > > > > > I'm assuming I'm setting up the distributed cache incorrectly. I've > seen > > a > > > few hints in previous mailing list postings that indicate that the > > > distributed cache is accessed via the Job and JobContext objects in the > > > revised API, but the javadocs don't seem to support that. > > > > > > Thanks. > > > Larry > > > > > >
