On Tue, Jan 5, 2010 at 10:02 AM, Drew Farris <drew.far...@gmail.com> wrote:

> On Tue, Jan 5, 2010 at 11:46 AM, Drew Farris <drew.far...@gmail.com>
> wrote:
>
> >
> > Have you seen any cases where a class hierarchy of Writables is
> > established to do something like that? E.g the mapreduce jobs are
> > written to use VectorWritable, but subclasses (e.g
> > SparseVectorWritable) are available for specific needs?
> >
> >
> Bah, nevermind -- this is precisely what Mahout does today without
> separating the Vector and Writable portions into two separate classes.
> Serious brain lapse that one.
>

Yeah, that was what isn't working well - Hadoop likes to check exact
match on classes, and kills proper OOD.  There may be a reason for it,
but I can't see it.


> Of course this would probably be a very straightforward approach to
> implement: Simply separate out the Writable portions of each Vector
> implementation into its own class. The Writable implementation to use would
> specified at runtime and this would also determine which underlying Vector
> implementation is used. The price we pay for separating the Writable stuff
> from the Vectors is an extra class that implements Writable for each
> implementation. Since the Writable (an thus implementation) to use is
> specified at runtime via options, there's no need for an ugly switch
> statement anywhere.
>

How would you specify which Writable implementation at runtime?  You
have Mapper and Reducers which are keyed on Writable types... you need
to pick which one to use.


> Theoretically one could even decouple the writable (serialization style)
> from the (in-memory) implementation, but I don't know if there is any need
> for that whatsoever.
>

Yeah, I'd like this, because the two different SparseVector impls have
different
in-memory structure, but basically the same serialization (key-value pairs
of
int and double).  I think I can work around a way to get this to work.  Just
not
sure how ugly it would get.

  -jake

Reply via email to