On Tue, Jan 5, 2010 at 9:18 AM, Jake Mannix <jake.man...@gmail.com> wrote:
>  From what I can tell, in SequenceFile.Writer#append(Object key, Object
> value) (why on earth is it taking Objects?  shouldn't these be Writables?),

There's also a version that takes Writables. Why, I don't know, but
assume you're triggering the other one.

> it does an explicit check of key.getClass == this.keyClass and
> value.getClass() == this.valueClass, which won't do any subclass matching
> (and so will fail if value.getClass() is DenseVector.class, and valueClass
> is SparseVector.class, or just Vector.class).

Yes, I've hit this too. You can say the value or key class is an
interface for this reason. It has to be the very class in use. I can
imagine reasons for this.


>  To avoid this kind of mess, it seems the proper approach in MAHOUT-205
> would be to have one overall VectorWritable class, which can
> serialize/deserialize all Vector implementations.  Right?  This is how I've

Yes this is what I imagine.


> is a pain - you need to either move all the write(DataOutput) and
> readFields(DataInput) methods from the vector implementations into the new
> VectorWritable, and have a big switch statement deciding which one to call,

I would imagine the serialized form of a vector is the same for
SparseVector, DenseVector, etc. There's no question of representation.
You write out all the non-default elements.

Reading in, yes there is some element of choice, and your heuristic is
fine. VectorWritable creates a Vector which can be obtained by the
caller, and could be sparse or dense.

Is your point that this won't do for some reason?

Reply via email to