Hello all,
As a new user of hadoop, I am having some problems with understanding some
things. I am writing a program to load a file to the distributed cache and read
this file in each mapper. In my driver program, I have added the file to my
distributed cache using:
Path p=new
Path("hdfs://localhost:9100/user/denimLive/denim/DCache/Orders.txt");
DistributedCache.addCacheFile(p.toUri(), conf);
In the configure method of the mapper, I am reading the file from cache using:
Path[] cacheFiles=DistributedCache.getFileClassPaths(conf);
BufferedReader joinReader=new BufferedReader(new
FileReader(cacheFiles[0].toString()));
however, the cacheFiles variable has null value in it.
There is something mentioned on the Yahoo tutorial for hadoop about distributed
cache which I do not understand:
As a cautionary note: If you use the local JobRunner in Hadoop (i.e., what
happens if you call JobClient.runJob()in a program with no or an empty
hadoop-conf.xmlaccessible), then no local data directory is created; the
getLocalCacheFiles()call will return an empty set of results. Unit test code
should take this into account."
what does this mean? I am executing my program in pseudo-distributed mode on
windows using Eclipse.
Any suggestion in this regard is highly valued.
Thanks in advance.