+1 Is there a PR? You mention a "tiny mahout-hadoop” module. It would be nice to see how you’ve structured that in case we can use the same model to solve the two remaining refactoring issues. 1) external dependencies in the spark module 2) no spark or h2o in the release artifacts.
On Jan 23, 2015, at 6:45 PM, Shannon Quinn <squ...@gatech.edu> wrote: Also +1 iPhone'd > On Jan 23, 2015, at 18:38, Andrew Palumbo <ap....@outlook.com> wrote: > > +1 > > > Sent from my Verizon Wireless 4G LTE smartphone > > <div>-------- Original message --------</div><div>From: Dmitriy Lyubimov > <dlie...@gmail.com> </div><div>Date:01/23/2015 6:06 PM (GMT-05:00) > </div><div>To: dev@mahout.apache.org </div><div>Subject: Codebase refactoring > proposal </div><div> > </div> > So right now mahout-spark depends on mr-legacy. > I did quick refactoring and it turns out it only _irrevocably_ depends on > the following classes there: > > MatrixWritable, VectorWritable, Varint/Varlong and VarintWritable, and ... > *sigh* o.a.m.common.Pair > > So I just dropped those five classes into new a new tiny mahout-hadoop > module (to signify stuff that is directly relevant to serializing thigns to > DFS API) and completely removed mrlegacy and its transients from spark and > spark-shell dependencies. > > So non-cli applications (shell scripts and embedded api use) actually only > need spark dependencies (which come from SPARK_HOME classpath, of course) > and mahout jars (mahout-spark, mahout-math(-scala), mahout-hadoop and > optionally mahout-spark-shell (for running shell)). > > This of course still doesn't address driver problems that want to throw > more stuff into front-end classpath (such as cli parser) but at least it > renders transitive luggage of mr-legacy (and the size of worker-shipped > jars) much more tolerable. > > How does that sound?