+1 On Fri, Jan 23, 2015 at 6:04 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
> So right now mahout-spark depends on mr-legacy. > I did quick refactoring and it turns out it only _irrevocably_ depends on > the following classes there: > > MatrixWritable, VectorWritable, Varint/Varlong and VarintWritable, and ... > *sigh* o.a.m.common.Pair > > So I just dropped those five classes into new a new tiny mahout-hadoop > module (to signify stuff that is directly relevant to serializing thigns to > DFS API) and completely removed mrlegacy and its transients from spark and > spark-shell dependencies. > > So non-cli applications (shell scripts and embedded api use) actually only > need spark dependencies (which come from SPARK_HOME classpath, of course) > and mahout jars (mahout-spark, mahout-math(-scala), mahout-hadoop and > optionally mahout-spark-shell (for running shell)). > > This of course still doesn't address driver problems that want to throw > more stuff into front-end classpath (such as cli parser) but at least it > renders transitive luggage of mr-legacy (and the size of worker-shipped > jars) much more tolerable. > > How does that sound? >