Re: Codebase refactoring proposal

Dmitriy Lyubimov Fri, 30 Jan 2015 16:05:33 -0800

looks like it is also requested by mahout-math, wonder what is using it
there.


At very least, it needs to be synchronized to the one currently used by
spark.

[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ mahout-hadoop
---
[INFO] org.apache.mahout:mahout-hadoop:jar:1.0-SNAPSHOT
*[INFO] +- org.apache.mahout:mahout-math:jar:1.0-SNAPSHOT:compile*
[INFO] |  +- org.apache.commons:commons-math3:jar:3.2:compile
*[INFO] |  +- com.google.guava:guava:jar:16.0:compile*
[INFO] |  \- com.tdunning:t-digest:jar:2.0.2:compile
[INFO] +- org.apache.mahout:mahout-math:test-jar:tests:1.0-SNAPSHOT:test
[INFO] +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile
[INFO] |  +- org.apache.hadoop:hadoop-common:jar:2.2.0:compile


On Fri, Jan 30, 2015 at 7:52 AM, Pat Ferrel <p...@occamsmachete.com> wrote:

> Looks like Guava is in Spark.
>
> On Jan 29, 2015, at 4:03 PM, Pat Ferrel <p...@occamsmachete.com> wrote:
>
> IndexedDataset uses Guava. Can’t tell from sure but it sounds like this
> would not be included since I think it was taken from the mrlegacy jar.
>
> On Jan 25, 2015, at 10:52 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
>
> ---------- Forwarded message ----------
> From: "Pat Ferrel" <p...@occamsmachete.com>
> Date: Jan 25, 2015 9:39 AM
> Subject: Re: Codebase refactoring proposal
> To: <dev@mahout.apache.org>
> Cc:
>
> > When you get a chance a PR would be good.
>
> Yes, it would. And not just for that.
>
> > As I understand it you are putting some class jars somewhere in the
> classpath. Where? How?
> >
>
> /bin/mahout
>
> (Computes 2 different classpaths. See  'bin/mahout classpath' vs.
> 'bin/mahout -spark'.)
>
> If i interpret current shell code there correctky, legacy path tries to use
> examples assemblies if not packaged, or /lib if packaged. True motivation
> of that significantly predates 2010 and i suspect only Benson knows whole
> true intent there.
>
> The spark path, which is really a quick hack of the script, tries to get
> only selected mahout jars and locally instlalled spark classpath which i
> guess is just the shaded spark jar in recent spark releases. It also
> apparently tries to include /libs/*, which is never compiled in unpackaged
> version, and now i think it is a bug it is included  because /libs/* is
> apparently legacy packaging, and shouldnt be used  in spark jobs with a
> wildcard. I cant beleive how lazy i am, i still did not find time to
> understand mahout build in all cases.
>
> I am not even sure if packaged mahout will work with spark, honestly,
> because of the /lib. Never tried that, since i mostly use application
> embedding techniques.
>
> The same solution may apply to adding external dependencies and removing
> the assembly in the Spark module. Which would leave only one major build
> issue afaik.
> >
> > On Jan 24, 2015, at 11:53 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> wrote:
> >
> > No, no PR. Only experiment on private. But i believe i sufficiently
> defined
> > what i want to do in order to gauge if we may want to advance it some
> time
> > later. Goal is much lighter dependency for spark code. Eliminate
> everything
> > that is not compile-time dependent. (and a lot of it is thru legacy MR
> code
> > which we of course don't use).
> >
> > Cant say i understand the remaining issues you are talking about though.
> >
> > If you are talking about compiling lib or shaded assembly, no, this
> doesn't
> > do anything about it. Although point is, as it stands, the algebra and
> > shell don't have any external dependencies but spark and these 4 (5?)
> > mahout jars so they technically don't even need an assembly (as
> > demonstrated).
> >
> > As i said, it seems driver code is the only one that may need some
> external
> > dependencies, but that's a different scenario from those i am talking
> > about. But i am relatively happy with having the first two working nicely
> > at this point.
> >
> > On Sat, Jan 24, 2015 at 9:06 AM, Pat Ferrel <p...@occamsmachete.com>
> wrote:
> >
> >> +1
> >>
> >> Is there a PR? You mention a "tiny mahout-hadoop” module. It would be
> nice
> >> to see how you’ve structured that in case we can use the same model to
> >> solve the two remaining refactoring issues.
> >> 1) external dependencies in the spark module
> >> 2) no spark or h2o in the release artifacts.
> >>
> >> On Jan 23, 2015, at 6:45 PM, Shannon Quinn <squ...@gatech.edu> wrote:
> >>
> >> Also +1
> >>
> >> iPhone'd
> >>
> >>> On Jan 23, 2015, at 18:38, Andrew Palumbo <ap....@outlook.com> wrote:
> >>>
> >>> +1
> >>>
> >>>
> >>> Sent from my Verizon Wireless 4G LTE smartphone
> >>>
> >>> <div>-------- Original message --------</div><div>From: Dmitriy
> Lyubimov
> >> <dlie...@gmail.com> </div><div>Date:01/23/2015  6:06 PM  (GMT-05:00)
> >> </div><div>To: dev@mahout.apache.org </div><div>Subject: Codebase
> >> refactoring proposal </div><div>
> >>> </div>
> >>> So right now mahout-spark depends on mr-legacy.
> >>> I did quick refactoring and it turns out it only _irrevocably_ depends
> on
> >>> the following classes there:
> >>>
> >>> MatrixWritable, VectorWritable, Varint/Varlong and VarintWritable, and
> >> ...
> >>> *sigh* o.a.m.common.Pair
> >>>
> >>> So  I just dropped those five classes into new a new tiny mahout-hadoop
> >>> module (to signify stuff that is directly relevant to serializing
> thigns
> >> to
> >>> DFS API) and completely removed mrlegacy and its transients from spark
> >> and
> >>> spark-shell dependencies.
> >>>
> >>> So non-cli applications (shell scripts and embedded api use) actually
> >> only
> >>> need spark dependencies (which come from SPARK_HOME classpath, of
> course)
> >>> and mahout jars (mahout-spark, mahout-math(-scala), mahout-hadoop and
> >>> optionally mahout-spark-shell (for running shell)).
> >>>
> >>> This of course still doesn't address driver problems that want to throw
> >>> more stuff into front-end classpath (such as cli parser) but at least
> it
> >>> renders transitive luggage of mr-legacy (and the size of worker-shipped
> >>> jars) much more tolerable.
> >>>
> >>> How does that sound?
> >>
> >>
> >
>
>
>

Re: Codebase refactoring proposal

Reply via email to