On 04/04/2012 05:00 PM, Kevin Savage wrote:
However, what we have is one big file of design data that needs to go to all the maps and many big files of climate data that need to go to one map each. I've not been able to work out if there is a good way of doing this in Hadoop.
It sounds like "one big file" belongs on the DistributedCache, while the "many big files" should be set up as the input using some subclass of FileInputFormat
hth