Hello all, As a new user of hadoop, I am having some problems with understanding some things. I am writing a program to load a file to the distributed cache and read this file in each mapper. In my driver program, I have added the file to my distributed cache using: Path p=new Path("hdfs://localhost:9100/user/denimLive/denim/DCache/Orders.txt"); DistributedCache.addCacheFile(p.toUri(), conf);
In the configure method of the mapper, I am reading the file from cache using: Path[] cacheFiles=DistributedCache.getFileClassPaths(conf); BufferedReader joinReader=new BufferedReader(new FileReader(cacheFiles[0].toString())); however, the cacheFiles variable has null value in it. There is something mentioned on the Yahoo tutorial for hadoop about distributed cache which I do not understand: As a cautionary note: If you use the local JobRunner in Hadoop (i.e., what happens if you call JobClient.runJob()in a program with no or an empty hadoop-conf.xmlaccessible), then no local data directory is created; the getLocalCacheFiles()call will return an empty set of results. Unit test code should take this into account." what does this mean? I am executing my program in pseudo-distributed mode on windows using Eclipse. Any suggestion in this regard is highly valued. Thanks in advance.