Antonio,

If you're interested in open sourcing this, I'd be interested in
using/helping.  We do something internally that's similar to this, and I've
been meaning to write a general-purpose version for a while.  Would love to
see what you've done and contribute back any changes we make.  Github?

Thanks,
Kevin

On Fri, Dec 11, 2009 at 11:23 AM, Antonio D'Ettole <[email protected]>wrote:

> Philip,
>
> that was quick and precise. I learned something today. Thank you!
> Antonio
>
> On Fri, Dec 11, 2009 at 8:20 PM, Philip Zeyliger <[email protected]
> >wrote:
>
> > Hi Antonio,
> >
> > Check out MapTask.java.  When your job gets instantiated on the cluster,
> an
> > InputSplit object is created for the task, using reflection.  An
> InputSplit
> > is a Writable, and, like all writables, it gets created with an empty
> > constructor and initialized with readFields().
> >
> > If you implement write() and readFields() correctly (think of these as
> > serialization and de-serialization functions), it should all work.  See
> > FileSplit for an example of how FileInputFormat does it.
> >
> > Cheers,
> >
> > -- Philip
> >
> > Here's a code excerpt from MapTask.java, that's relevant:
> >
> >
> >  void runOldMapper(final JobConf job,
> > >                     final BytesWritable rawSplit,
> > >                     final TaskUmbilicalProtocol umbilical,
> > >                     TaskReporter reporter
> > >                     ) throws IOException, InterruptedException,
> > >                              ClassNotFoundException {
> > >     InputSplit inputSplit = null;
> > >     // reinstantiate the split
> > >     try {
> > >       inputSplit = (InputSplit)
> > >         ReflectionUtils.newInstance(job.getClassByName(splitClass),
> job);
> > >     } catch (ClassNotFoundException exp) {
> > >       IOException wrap = new IOException("Split class " + splitClass +
> > >                                          " not found");
> > >       wrap.initCause(exp);
> > >       throw wrap;
> > >     }
> > >     DataInputBuffer splitBuffer = new DataInputBuffer();
> > >     splitBuffer.reset(split.getBytes(), 0, split.getLength());
> > >     inputSplit.readFields(splitBuffer);
> > >
> >
> >
> >
> > On Fri, Dec 11, 2009 at 11:03 AM, Antonio D'Ettole <[email protected]
> > >wrote:
> >
> > > Hi,
> > >
> > > I've been trying to code a pretty simple InputFormat. The idea is this:
> I
> > > have an array of numbers (say, the range [0-5000]) and I want each
> mapper
> > > to
> > > receive a split of size 500 i.e. 500 LongWritable's.
> > >
> > > this is an excerpt from the class extending InputSplit:
> > >
> > > public class myInputSplit extends InputSplit implements Writable {
> > >
> > > long[] rows;
> > >        myInputSplit(){ }
> > >
> > > public myInputSplit(long[] rows) {
> > > this.rows=rows;
> > > }
> > >
> > >    .....
> > >
> > > }
> > >
> > > I also wrote the classes myInputFormat and myRecordReader (omitted).
> > >
> > > Now, the default constructor in the class above doesn't do much but I
> had
> > > to
> > > put it there anyway because hadoop was throwing an exception at runtime
> > > because it couldn't find said constructor. Obviously myInputFormat uses
> > the
> > > right constructor with the long[] argument, but hadoop sems somehow to
> > give
> > > the mapper input splits which have been built using the default
> > > constructor,
> > > which is used nowhere in my code. I can tell because i put a breakpoint
> > in
> > > the default constructor and yes, it is being called. As a result all
> the
> > > input splits that are processed by the mappers are "broken" as the
> "rows"
> > > variable was never set.
> > > Interestingly, I also put a breakpoint in the _right_ constructor and
> it
> > is
> > > also being called, by the getSplits() method in myInputFormat (which is
> > > what
> > > one would expect)
> > >
> > > Does anybody have an idea why the default constructor is being called?
> > >
> > > I hope I was clear enough, thanks for your time.
> > > Antonio
> > >
> >
>

Reply via email to