Philip,

that was quick and precise. I learned something today. Thank you!
Antonio

On Fri, Dec 11, 2009 at 8:20 PM, Philip Zeyliger <[email protected]>wrote:

> Hi Antonio,
>
> Check out MapTask.java.  When your job gets instantiated on the cluster, an
> InputSplit object is created for the task, using reflection.  An InputSplit
> is a Writable, and, like all writables, it gets created with an empty
> constructor and initialized with readFields().
>
> If you implement write() and readFields() correctly (think of these as
> serialization and de-serialization functions), it should all work.  See
> FileSplit for an example of how FileInputFormat does it.
>
> Cheers,
>
> -- Philip
>
> Here's a code excerpt from MapTask.java, that's relevant:
>
>
>  void runOldMapper(final JobConf job,
> >                     final BytesWritable rawSplit,
> >                     final TaskUmbilicalProtocol umbilical,
> >                     TaskReporter reporter
> >                     ) throws IOException, InterruptedException,
> >                              ClassNotFoundException {
> >     InputSplit inputSplit = null;
> >     // reinstantiate the split
> >     try {
> >       inputSplit = (InputSplit)
> >         ReflectionUtils.newInstance(job.getClassByName(splitClass), job);
> >     } catch (ClassNotFoundException exp) {
> >       IOException wrap = new IOException("Split class " + splitClass +
> >                                          " not found");
> >       wrap.initCause(exp);
> >       throw wrap;
> >     }
> >     DataInputBuffer splitBuffer = new DataInputBuffer();
> >     splitBuffer.reset(split.getBytes(), 0, split.getLength());
> >     inputSplit.readFields(splitBuffer);
> >
>
>
>
> On Fri, Dec 11, 2009 at 11:03 AM, Antonio D'Ettole <[email protected]
> >wrote:
>
> > Hi,
> >
> > I've been trying to code a pretty simple InputFormat. The idea is this: I
> > have an array of numbers (say, the range [0-5000]) and I want each mapper
> > to
> > receive a split of size 500 i.e. 500 LongWritable's.
> >
> > this is an excerpt from the class extending InputSplit:
> >
> > public class myInputSplit extends InputSplit implements Writable {
> >
> > long[] rows;
> >        myInputSplit(){ }
> >
> > public myInputSplit(long[] rows) {
> > this.rows=rows;
> > }
> >
> >    .....
> >
> > }
> >
> > I also wrote the classes myInputFormat and myRecordReader (omitted).
> >
> > Now, the default constructor in the class above doesn't do much but I had
> > to
> > put it there anyway because hadoop was throwing an exception at runtime
> > because it couldn't find said constructor. Obviously myInputFormat uses
> the
> > right constructor with the long[] argument, but hadoop sems somehow to
> give
> > the mapper input splits which have been built using the default
> > constructor,
> > which is used nowhere in my code. I can tell because i put a breakpoint
> in
> > the default constructor and yes, it is being called. As a result all the
> > input splits that are processed by the mappers are "broken" as the "rows"
> > variable was never set.
> > Interestingly, I also put a breakpoint in the _right_ constructor and it
> is
> > also being called, by the getSplits() method in myInputFormat (which is
> > what
> > one would expect)
> >
> > Does anybody have an idea why the default constructor is being called?
> >
> > I hope I was clear enough, thanks for your time.
> > Antonio
> >
>

Reply via email to