Hi, Create the "Job" after you create the configuration.
Like., Path p=new Path("hdfs://localhost:9100/user/denimLive/denim/DCache/Orders.txt"); DistributedCache.addCacheFile(p.toUri(), conf); Job job = new Job(conf, "Driver"); If you create the "Job" before creating configuration, for some reason DistributedCache doesn't work. Raja Thiruvathuru On Thu, Jul 8, 2010 at 3:02 PM, Denim Live <denim.l...@yahoo.com> wrote: > Hello all, > > As a new user of hadoop, I am having some problems with understanding some > things. I am writing a program to load a file to the distributed cache and > read this file in each mapper. In my driver program, I have added the file > to my distributed cache using: > > Path p=new > Path("hdfs://localhost:9100/user/denimLive/denim/DCache/Orders.txt"); > DistributedCache.addCacheFile(p.toUri(), conf); > > In the configure method of the mapper, I am reading the file from cache > using: > Path[] cacheFiles=DistributedCache.getFileClassPaths(conf); > BufferedReader joinReader=new BufferedReader(new > FileReader(cacheFiles[0].toString())); > > however, the cacheFiles variable has null value in it. > > There is something mentioned on the Yahoo > tutorial<http://developer.yahoo.com/hadoop/tutorial/module5.html>for hadoop > about distributed cache which I do not understand: > > As a cautionary note: If you use the local JobRunner in Hadoop (i.e., what > happens if you call JobClient.runJob() in a program with no or an empty > hadoop-conf.xml accessible), then no local data directory is created; the > getLocalCacheFiles() call will return an empty set of results. Unit test > code should take this into account." > > what does this mean? I am executing my program in pseudo-distributed mode > on windows using Eclipse. > > Any suggestion in this regard is highly valued. > > Thanks in advance. > > -- Raja Thiruvathuru