Ah a fourth mechanism. Not bad per se, as long as there a Good Reason to use Serializable in one place, Writable another, GSON elsewhere and Avro as well.
Reasons come from goals and use cases. Right now IMHO there are really two input / output formats for anything that interacts with the outside world: - Text of various stripes - Vector Writable Text is the ultimate lowest-common-denominator: human readable, cross-language, but not efficient. Vector Writable is the opposite. And between them, if I squint, that answers the use cases. Wild idea: is that about right? What happens if GSON is removed, Avro not used, Serializable not used? Mahout's nature will always be a bit of a 'bazaar' project, really a loose confederation of implementations that are not entirely consistent. I imagine though that taking targeted shots at chunky issues like this (and standardizing on Hadoop 0.20.x APIs for instance) gets rid of 80% of the divergence. And that's pretty fine for such a project. Better perhaps than many closed / proprietary code bases.
