Re: Setting number of Maps

Doug Cutting Tue, 03 Jul 2007 09:40:09 -0700

You could define an InputFormat whose InputSplits are not files, butrather simply have a field that is a complex number. The complex fieldwould be written and read by Writable#write() and Writable#readFields.This InputFormat would ignore the input directory, since it is not afile-based input. Could that work?

Splits are serialized by the job client when the job is submitted andreconstituted by map tasks when they run. So the class that implementsyour split must be included in your job's jar, so that the split can beread.

You could supply the list of complex numbers in a local file that's readby InputFormat#getSplits(). This file could be named by a job property.


Doug

Oliver Haggarty wrote:

Hi,
I'm writing a mapreduce task that will take a load of complex numbers,do some processing on each then return a double. As this processing willbe complex and could take up to 10 minutes I am using Hadoop todistribute this amongst many machines.
So ideally for each complex number I want a new map task to spread theload most efficiently. A typical run might have as many as 7500 complexnumbers that need processing. I will eventually have access to a clusterof approximately 500 machines.
So far, the only way I can get one map task per complex number is tocreate a new SequenceFile for each number in the input directory. Thistakes a while though and I was hoping I could just create a singleSequenceFile holding all the complex numbers, and then use theJobConf.setNumMapTasks(n) to get one map task per number in the file.This doesn't work though, and I end up with approx 60-70 complex numbersper map task (depending on the total number of input numbers).
Does anyone have any idea why this second method doesn't work? If it isnot supposed to work in this way are there any suggestions as to how toget a map per input record without having to put each one in a separatefile?
Thanks in advance for any help,

Ollie

Re: Setting number of Maps

Reply via email to