+1

On Fri, Jan 23, 2015 at 6:04 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

> So right now mahout-spark depends on mr-legacy.
> I did quick refactoring and it turns out it only _irrevocably_ depends on
> the following classes there:
>
> MatrixWritable, VectorWritable, Varint/Varlong and VarintWritable, and ...
> *sigh* o.a.m.common.Pair
>
> So  I just dropped those five classes into new a new tiny mahout-hadoop
> module (to signify stuff that is directly relevant to serializing thigns to
> DFS API) and completely removed mrlegacy and its transients from spark and
> spark-shell dependencies.
>
> So non-cli applications (shell scripts and embedded api use) actually only
> need spark dependencies (which come from SPARK_HOME classpath, of course)
> and mahout jars (mahout-spark, mahout-math(-scala), mahout-hadoop and
> optionally mahout-spark-shell (for running shell)).
>
> This of course still doesn't address driver problems that want to throw
> more stuff into front-end classpath (such as cli parser) but at least it
> renders transitive luggage of mr-legacy (and the size of worker-shipped
> jars) much more tolerable.
>
> How does that sound?
>

Reply via email to