Good point and something that must be resolved. You're bordering on my
next desired change, which maybe I lump into this massive patch.

NamedVectorWritable extends NamedVector extends Vector. At first
glance, great, I can treat NamedVectorWritable like any old Vector in
my code and it's all compatible.

But NamedVectorWritable does not extend VectorWritable. It can't
because it extends NamedVector to match the other *Writable classes.

This really ought to be a decorator pattern. DenseVectorWritable does
not need to extend DenseVector; it does not conceptually need to
walk/talk like a DenseVector. It should enclose a DenseVector which
you can get at. It also lets us not do funky stuff like make all
Vector data structures protected for access by subclasses, which makes
me uneasy.

The *Writable hierarchy ought to mirror the *Vector hierarchy, and
this becomes possible with decoration.

Then, I don't care if the file has NamedVectorWritable, if I just want
a VectorWritable.

Almost. Actually you do need to tell Hadoop the exact class of what's
stored on disk. That should always be VectorWritable, which is
actually a meta-Writable -- it's a factory for other Writables.

So I guess the net proposal is -- we always use VectorWritable, and we
switch the design pattern to make things smoother, and all is well.


On Sun, Apr 18, 2010 at 7:44 PM, Jake Mannix <jake.man...@gmail.com> wrote:
> It's not just that it is complicated, it's that say you want to do
> clustering.  You make a SequenceFile of any old key type, and
> NamedVectorWritable as the value.  Now you can't use that file as input for
> any DistributedRowMatrix operation, you have to do a full pass over the data
> to peel off the names and spit out regular VectorWritables...
>
>  -jake
>
> On Apr 18, 2010 11:37 AM, "Sean Owen" <sro...@gmail.com> wrote:
>
> NamedVectorWritable would go with it.
>
> ... and if you're going to bring up that that gets a little
> complicated, I totally agree, and would love to get on a tangent about
> making this a decorator pattern rather than subclass.
>
> On Sun, Apr 18, 2010 at 7:26 PM, Jake Mannix <jake.man...@gmail.com> wrote:
>> What would be the Wri...
>

Reply via email to