John, Not 100% clear on what you meant. You are saying I should put the file into my HDFS cluster or should I use DistributedCache? If you suggest the latter, could you address my original question?
Thanks for your help! Pony On Mon, Jun 6, 2011 at 6:27 PM, John Armstrong <[email protected]>wrote: > On Mon, 06 Jun 2011 16:14:14 -0500, Shi Yu <[email protected]> wrote: > > I still don't understand, in a cluster you have a shared directory to > > all the nodes, right? Just put the configuration file in that directory > > and load it in all the mappers, isn't that simple? > > So I still don't understand why bother DistributedCache, the only reason > > > might be the shared directory is costly for network and usually has > > storage limit. > > That's exactly the problem the DistributedCache is designed for. It > guarantees that you only need to copy the file to any given local > filesystem once. Using the way you suggest, if there are a hundred mappers > on a given node they'd all need to make a local copy of the file instead of > just making one local copy and moving it around locally from then on. >
