Hi Mark, You need to pass complete URL of the file on DFS for DistributedCache.addCacheFile. Please see http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#DistributedCache And http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/filecache/DistributedCache.html for the usage.
Thanks Amareshwari On 5/5/10 4:22 AM, "Mark Tozzi" <[email protected]> wrote: Hi all, I've been tinkering with hadoop for some time, but am new to the mailing list. Please forgive me if this has already been asked and answered. I am attempting to use the Distributed Cache to allow my map reduce job to access some lookup files. I have the following code to add the files to the distributed cache (showing only a single file for brevity): tmpPath = new Path(cl.getOptionValue("lookup_file")); conf.set("lookupfileName", tmpPath.getName()); DistributedCache.addCacheFile(tmpPath.toUri(),conf); System.out.println("added " + tmpPath.toUri().toString() + " as " + tmpPath.getName() ); and the following code in the Mapper.setup method to access these files: Path[] localFiles = DistributedCache.getLocalCacheFiles(conf); for (Path file : localFiles) { if (file.getName().equals( conf.get("lookupfileName")) ){ parser.registerResource("bad_uas", new FileReader(new File( file.toUri()))); } // further checks for other files in cache } this is generating the exception "java.lang.IllegalArgumentException: URI is not absolute" when I attempt to instantiate the File object. The registerResource method is currently designed to accept an instance of a reader from which it pulls its information. That method is under my control, and I can reconfigure it to take a more appropriate input if such exists. I have tried a few variations on this specific method, and all seem to come back to the "URI is not absolute" error. What is the piece I am missing here? Thanks, --Mark Tozzi
