Ah a fourth mechanism. Not bad per se, as long as there a Good Reason
to use Serializable in one place, Writable another, GSON elsewhere and
Avro as well.

Reasons come from goals and use cases. Right now IMHO there are really
two input / output formats for anything that interacts with the
outside world:

- Text of various stripes
- Vector Writable

Text is the ultimate lowest-common-denominator: human readable,
cross-language, but not efficient. Vector Writable is the opposite.
And between them, if I squint, that answers the use cases.

Wild idea: is that about right? What happens if GSON is removed, Avro
not used, Serializable not used?


Mahout's nature will always be a bit of a 'bazaar' project, really a
loose confederation of implementations that are not entirely
consistent. I imagine though that taking targeted shots at chunky
issues like this (and standardizing on Hadoop 0.20.x APIs for
instance) gets rid of 80% of the divergence. And that's pretty fine
for such a project. Better perhaps than many closed / proprietary code
bases.

Reply via email to