Good point and something that must be resolved. You're bordering on my next desired change, which maybe I lump into this massive patch.
NamedVectorWritable extends NamedVector extends Vector. At first glance, great, I can treat NamedVectorWritable like any old Vector in my code and it's all compatible. But NamedVectorWritable does not extend VectorWritable. It can't because it extends NamedVector to match the other *Writable classes. This really ought to be a decorator pattern. DenseVectorWritable does not need to extend DenseVector; it does not conceptually need to walk/talk like a DenseVector. It should enclose a DenseVector which you can get at. It also lets us not do funky stuff like make all Vector data structures protected for access by subclasses, which makes me uneasy. The *Writable hierarchy ought to mirror the *Vector hierarchy, and this becomes possible with decoration. Then, I don't care if the file has NamedVectorWritable, if I just want a VectorWritable. Almost. Actually you do need to tell Hadoop the exact class of what's stored on disk. That should always be VectorWritable, which is actually a meta-Writable -- it's a factory for other Writables. So I guess the net proposal is -- we always use VectorWritable, and we switch the design pattern to make things smoother, and all is well. On Sun, Apr 18, 2010 at 7:44 PM, Jake Mannix <jake.man...@gmail.com> wrote: > It's not just that it is complicated, it's that say you want to do > clustering. You make a SequenceFile of any old key type, and > NamedVectorWritable as the value. Now you can't use that file as input for > any DistributedRowMatrix operation, you have to do a full pass over the data > to peel off the names and spit out regular VectorWritables... > > -jake > > On Apr 18, 2010 11:37 AM, "Sean Owen" <sro...@gmail.com> wrote: > > NamedVectorWritable would go with it. > > ... and if you're going to bring up that that gets a little > complicated, I totally agree, and would love to get on a tangent about > making this a decorator pattern rather than subclass. > > On Sun, Apr 18, 2010 at 7:26 PM, Jake Mannix <jake.man...@gmail.com> wrote: >> What would be the Wri... >