Thanks for that link Prashant - very useful.
Two brief follow-up questions:
1) Having put data in the cache, I would like to be a good citizen by deleting
the data from the cache once
I’ve finished - how do I do that?
2) Would it be simpler to pass the data as a value in the jobConf object?
Thanks,
Andy D.
On 25 Nov 2011, at 12:14, Prashant Kommireddi wrote:
> I believe you want to ship data to each node in your cluster before MR
> begins so the mappers can access files local to their machine. Hadoop
> tutorial on YDN has some good info on this.
>
> http://developer.yahoo.com/hadoop/tutorial/module5.html#auxdata
>
> -Prashant Kommireddi
>
> On Fri, Nov 25, 2011 at 1:05 AM, Andy Doddington <[email protected]>wrote:
>
>> I have a series of mappers that I would like to be passed data using the
>> distributed cache mechanism. At the
>> moment, I am using HDFS to pass the data, but this seems wasteful to me,
>> since they are all reading the same data.
>>
>> Is there a piece of example code that shows how data files can be placed
>> in the cache and accessed by mappers?
>>
>> Thanks,
>>
>> Andy Doddington
>>
>>