Yes, I think if we can convince ourselves that there won't be that
many different possibilities for representing a vector, then a simple
boolean might unify everything. This approach doesn't 'scale' but I
don't know there are other representations we must have.

The issue of named vectors is interesting. There's not really such a
thing as an optional field in Hadoop serialization. You can fake it
with a boolean but that starts to be messy.

Messy might be necessary as vectors perhaps take on more metadata --
though I can't envision much more. So perhaps it is right and proper
to retain a second serialization format, in NamedVectorWritable, which
is really the "vector with metadata" serializer versus
VectorWritable's "pure vector" serializer.

It has a logic to me. It gets rid of writing the class name which is
indeed unpalatable.

Thoughts before I go tearing through again?

Reply via email to