Thanks for that link Prashant - very useful.

Two brief follow-up questions:

1) Having put data in the cache, I would like to be a good citizen by deleting 
the data from the cache once
    I’ve finished - how do I do that?
2) Would it be simpler to pass the data as a value in the jobConf object?

Thanks,

        Andy D.

On 25 Nov 2011, at 12:14, Prashant Kommireddi wrote:

> I believe you want to ship data to each node in your cluster before MR
> begins so the mappers can access files local to their machine. Hadoop
> tutorial on YDN has some good info on this.
> 
> http://developer.yahoo.com/hadoop/tutorial/module5.html#auxdata
> 
> -Prashant Kommireddi
> 
> On Fri, Nov 25, 2011 at 1:05 AM, Andy Doddington <[email protected]>wrote:
> 
>> I have a series of mappers that I would like to be passed data using the
>> distributed cache mechanism. At the
>> moment, I am using HDFS to pass the data, but this seems wasteful to me,
>> since they are all reading the same data.
>> 
>> Is there a piece of example code that shows how data files can be placed
>> in the cache and accessed by mappers?
>> 
>> Thanks,
>> 
>>       Andy Doddington
>> 
>> 

Reply via email to