Not sure if people saw this from Josh at Cloudera: http://blog.cloudera.com/blog/2013/03/cloudera_ml_data_science_tools/ https://github.com/cloudera/ml
This is a nice shot at packaging up the various workflows around the core work of k-means on Hadoop. Inside it's got pieces from Mahout, packaged together inside things like Crunch, an Avro-based API, etc.
