Just a heads up... if you haven't tracked jira... for this This is a big commit that, aside from its primary content, brings in Spark 0.9 (in a separate module only), updates scala to 2.10.3 , scalatest to 2.0 . Spark module also has CDH-compatible maven profile (which I think still pulls CDH4 hdfs dependencies).
Exact hadoop dependency is fairly irrelevant since one can compile and setup spark to work with whatever version, and all mahout-spark binary therefore will be compatible with whatever hdfs without recompilation. -d
