So right now mahout-spark depends on mr-legacy.
I did quick refactoring and it turns out it only _irrevocably_ depends on
the following classes there:

MatrixWritable, VectorWritable, Varint/Varlong and VarintWritable, and ...
*sigh* o.a.m.common.Pair

So  I just dropped those five classes into new a new tiny mahout-hadoop
module (to signify stuff that is directly relevant to serializing thigns to
DFS API) and completely removed mrlegacy and its transients from spark and
spark-shell dependencies.

So non-cli applications (shell scripts and embedded api use) actually only
need spark dependencies (which come from SPARK_HOME classpath, of course)
and mahout jars (mahout-spark, mahout-math(-scala), mahout-hadoop and
optionally mahout-spark-shell (for running shell)).

This of course still doesn't address driver problems that want to throw
more stuff into front-end classpath (such as cli parser) but at least it
renders transitive luggage of mr-legacy (and the size of worker-shipped
jars) much more tolerable.

How does that sound?

Reply via email to