So right now mahout-spark depends on mr-legacy. I did quick refactoring and it turns out it only _irrevocably_ depends on the following classes there:
MatrixWritable, VectorWritable, Varint/Varlong and VarintWritable, and ... *sigh* o.a.m.common.Pair So I just dropped those five classes into new a new tiny mahout-hadoop module (to signify stuff that is directly relevant to serializing thigns to DFS API) and completely removed mrlegacy and its transients from spark and spark-shell dependencies. So non-cli applications (shell scripts and embedded api use) actually only need spark dependencies (which come from SPARK_HOME classpath, of course) and mahout jars (mahout-spark, mahout-math(-scala), mahout-hadoop and optionally mahout-spark-shell (for running shell)). This of course still doesn't address driver problems that want to throw more stuff into front-end classpath (such as cli parser) but at least it renders transitive luggage of mr-legacy (and the size of worker-shipped jars) much more tolerable. How does that sound?