On 6/6/2011 3:49 PM, Juan P. wrote:
Hi guys,
I'm going through Hadoop The Definitive Guide trying to understand how to
use DistributedCache (0.20.2) to make a configuration file available to my
Mapper in every node of the cluster. The book says I should use
DistributedCache.addCacheFile to make the file available and then retrieve
it using DistributedCache.getLocalCacheFiles. However when I run my Job it
fails because the config file is not present. Looking at the implementation
of DistributedCache, addCacheFile sets the attribute mapred.cache.files
while getLocalCacheFiles retrieves the property mapred.cache.localFiles.
I figured maybe I should use setLocalFiles with getLocalCacheFiles or
getCacheFiles which addCacheFile but I wanted to see if anyone had some
further explanation as to why the code might not be working.

Thanks!
Pony

I still don't understand, in a cluster you have a shared directory to all the nodes, right? Just put the configuration file in that directory and load it in all the mappers, isn't that simple? So I still don't understand why bother DistributedCache, the only reason might be the shared directory is costly for network and usually has storage limit.

Shi

Reply via email to