Philip, that was quick and precise. I learned something today. Thank you! Antonio
On Fri, Dec 11, 2009 at 8:20 PM, Philip Zeyliger <[email protected]>wrote: > Hi Antonio, > > Check out MapTask.java. When your job gets instantiated on the cluster, an > InputSplit object is created for the task, using reflection. An InputSplit > is a Writable, and, like all writables, it gets created with an empty > constructor and initialized with readFields(). > > If you implement write() and readFields() correctly (think of these as > serialization and de-serialization functions), it should all work. See > FileSplit for an example of how FileInputFormat does it. > > Cheers, > > -- Philip > > Here's a code excerpt from MapTask.java, that's relevant: > > > void runOldMapper(final JobConf job, > > final BytesWritable rawSplit, > > final TaskUmbilicalProtocol umbilical, > > TaskReporter reporter > > ) throws IOException, InterruptedException, > > ClassNotFoundException { > > InputSplit inputSplit = null; > > // reinstantiate the split > > try { > > inputSplit = (InputSplit) > > ReflectionUtils.newInstance(job.getClassByName(splitClass), job); > > } catch (ClassNotFoundException exp) { > > IOException wrap = new IOException("Split class " + splitClass + > > " not found"); > > wrap.initCause(exp); > > throw wrap; > > } > > DataInputBuffer splitBuffer = new DataInputBuffer(); > > splitBuffer.reset(split.getBytes(), 0, split.getLength()); > > inputSplit.readFields(splitBuffer); > > > > > > On Fri, Dec 11, 2009 at 11:03 AM, Antonio D'Ettole <[email protected] > >wrote: > > > Hi, > > > > I've been trying to code a pretty simple InputFormat. The idea is this: I > > have an array of numbers (say, the range [0-5000]) and I want each mapper > > to > > receive a split of size 500 i.e. 500 LongWritable's. > > > > this is an excerpt from the class extending InputSplit: > > > > public class myInputSplit extends InputSplit implements Writable { > > > > long[] rows; > > myInputSplit(){ } > > > > public myInputSplit(long[] rows) { > > this.rows=rows; > > } > > > > ..... > > > > } > > > > I also wrote the classes myInputFormat and myRecordReader (omitted). > > > > Now, the default constructor in the class above doesn't do much but I had > > to > > put it there anyway because hadoop was throwing an exception at runtime > > because it couldn't find said constructor. Obviously myInputFormat uses > the > > right constructor with the long[] argument, but hadoop sems somehow to > give > > the mapper input splits which have been built using the default > > constructor, > > which is used nowhere in my code. I can tell because i put a breakpoint > in > > the default constructor and yes, it is being called. As a result all the > > input splits that are processed by the mappers are "broken" as the "rows" > > variable was never set. > > Interestingly, I also put a breakpoint in the _right_ constructor and it > is > > also being called, by the getSplits() method in myInputFormat (which is > > what > > one would expect) > > > > Does anybody have an idea why the default constructor is being called? > > > > I hope I was clear enough, thanks for your time. > > Antonio > > >
