Distributed Cache file issue

Denim Live Thu, 08 Jul 2010 12:03:36 -0700

Hello all,

As a new user of hadoop, I am having some problems with understanding some 
things. I am writing a program to load a file to the distributed cache and read 
this file in each mapper. In my driver program, I have added the file to my 
distributed cache using:
        
       Path p=new 
Path("hdfs://localhost:9100/user/denimLive/denim/DCache/Orders.txt");
        DistributedCache.addCacheFile(p.toUri(), conf);


In the configure method of the mapper, I am reading the file from cache using:
            Path[] cacheFiles=DistributedCache.getFileClassPaths(conf);
            BufferedReader joinReader=new BufferedReader(new 
FileReader(cacheFiles[0].toString()));

however, the cacheFiles variable has null value in it. 

There is something mentioned on the Yahoo tutorial for hadoop about distributed 
cache which I do not understand:

As a cautionary note: If you use the local JobRunner in   Hadoop (i.e., what 
happens if you call JobClient.runJob()in a program with no or an empty 
hadoop-conf.xmlaccessible),   then no local data directory is created; the 
getLocalCacheFiles()call will return an empty set of   results. Unit test code 
should take this into account."

what does this mean? I am executing my program in pseudo-distributed mode on 
windows using Eclipse.

Any suggestion in this regard is highly valued. 

Thanks  in advance.

Distributed Cache file issue

Reply via email to