John,
Not 100% clear on what you meant. You are saying I should put the file into
my HDFS cluster or should I use DistributedCache? If you suggest the latter,
could you address my original question?

Thanks for your help!
Pony

On Mon, Jun 6, 2011 at 6:27 PM, John Armstrong <[email protected]>wrote:

> On Mon, 06 Jun 2011 16:14:14 -0500, Shi Yu <[email protected]> wrote:
> > I still don't understand, in a cluster you have a shared directory to
> > all the nodes, right? Just put the configuration file in that directory
> > and load it in all the mappers, isn't that simple?
> > So I still don't understand why bother DistributedCache, the only reason
>
> > might be the shared directory is costly for network and usually has
> > storage limit.
>
> That's exactly the problem the DistributedCache is designed for.  It
> guarantees that you only need to copy the file to any given local
> filesystem once.  Using the way you suggest, if there are a hundred mappers
> on a given node they'd all need to make a local copy of the file instead of
> just making one local copy and moving it around locally from then on.
>

Reply via email to