On Tue, Jan 5, 2010 at 9:18 AM, Jake Mannix <jake.man...@gmail.com> wrote: > From what I can tell, in SequenceFile.Writer#append(Object key, Object > value) (why on earth is it taking Objects? shouldn't these be Writables?),
There's also a version that takes Writables. Why, I don't know, but assume you're triggering the other one. > it does an explicit check of key.getClass == this.keyClass and > value.getClass() == this.valueClass, which won't do any subclass matching > (and so will fail if value.getClass() is DenseVector.class, and valueClass > is SparseVector.class, or just Vector.class). Yes, I've hit this too. You can say the value or key class is an interface for this reason. It has to be the very class in use. I can imagine reasons for this. > To avoid this kind of mess, it seems the proper approach in MAHOUT-205 > would be to have one overall VectorWritable class, which can > serialize/deserialize all Vector implementations. Right? This is how I've Yes this is what I imagine. > is a pain - you need to either move all the write(DataOutput) and > readFields(DataInput) methods from the vector implementations into the new > VectorWritable, and have a big switch statement deciding which one to call, I would imagine the serialized form of a vector is the same for SparseVector, DenseVector, etc. There's no question of representation. You write out all the non-default elements. Reading in, yes there is some element of choice, and your heuristic is fine. VectorWritable creates a Vector which can be obtained by the caller, and could be sparse or dense. Is your point that this won't do for some reason?