looks like it is also requested by mahout-math, wonder what is using it there.
At very least, it needs to be synchronized to the one currently used by spark. [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ mahout-hadoop --- [INFO] org.apache.mahout:mahout-hadoop:jar:1.0-SNAPSHOT *[INFO] +- org.apache.mahout:mahout-math:jar:1.0-SNAPSHOT:compile* [INFO] | +- org.apache.commons:commons-math3:jar:3.2:compile *[INFO] | +- com.google.guava:guava:jar:16.0:compile* [INFO] | \- com.tdunning:t-digest:jar:2.0.2:compile [INFO] +- org.apache.mahout:mahout-math:test-jar:tests:1.0-SNAPSHOT:test [INFO] +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile [INFO] | +- org.apache.hadoop:hadoop-common:jar:2.2.0:compile On Fri, Jan 30, 2015 at 7:52 AM, Pat Ferrel <p...@occamsmachete.com> wrote: > Looks like Guava is in Spark. > > On Jan 29, 2015, at 4:03 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > > IndexedDataset uses Guava. Can’t tell from sure but it sounds like this > would not be included since I think it was taken from the mrlegacy jar. > > On Jan 25, 2015, at 10:52 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > > ---------- Forwarded message ---------- > From: "Pat Ferrel" <p...@occamsmachete.com> > Date: Jan 25, 2015 9:39 AM > Subject: Re: Codebase refactoring proposal > To: <dev@mahout.apache.org> > Cc: > > > When you get a chance a PR would be good. > > Yes, it would. And not just for that. > > > As I understand it you are putting some class jars somewhere in the > classpath. Where? How? > > > > /bin/mahout > > (Computes 2 different classpaths. See 'bin/mahout classpath' vs. > 'bin/mahout -spark'.) > > If i interpret current shell code there correctky, legacy path tries to use > examples assemblies if not packaged, or /lib if packaged. True motivation > of that significantly predates 2010 and i suspect only Benson knows whole > true intent there. > > The spark path, which is really a quick hack of the script, tries to get > only selected mahout jars and locally instlalled spark classpath which i > guess is just the shaded spark jar in recent spark releases. It also > apparently tries to include /libs/*, which is never compiled in unpackaged > version, and now i think it is a bug it is included because /libs/* is > apparently legacy packaging, and shouldnt be used in spark jobs with a > wildcard. I cant beleive how lazy i am, i still did not find time to > understand mahout build in all cases. > > I am not even sure if packaged mahout will work with spark, honestly, > because of the /lib. Never tried that, since i mostly use application > embedding techniques. > > The same solution may apply to adding external dependencies and removing > the assembly in the Spark module. Which would leave only one major build > issue afaik. > > > > On Jan 24, 2015, at 11:53 PM, Dmitriy Lyubimov <dlie...@gmail.com> > wrote: > > > > No, no PR. Only experiment on private. But i believe i sufficiently > defined > > what i want to do in order to gauge if we may want to advance it some > time > > later. Goal is much lighter dependency for spark code. Eliminate > everything > > that is not compile-time dependent. (and a lot of it is thru legacy MR > code > > which we of course don't use). > > > > Cant say i understand the remaining issues you are talking about though. > > > > If you are talking about compiling lib or shaded assembly, no, this > doesn't > > do anything about it. Although point is, as it stands, the algebra and > > shell don't have any external dependencies but spark and these 4 (5?) > > mahout jars so they technically don't even need an assembly (as > > demonstrated). > > > > As i said, it seems driver code is the only one that may need some > external > > dependencies, but that's a different scenario from those i am talking > > about. But i am relatively happy with having the first two working nicely > > at this point. > > > > On Sat, Jan 24, 2015 at 9:06 AM, Pat Ferrel <p...@occamsmachete.com> > wrote: > > > >> +1 > >> > >> Is there a PR? You mention a "tiny mahout-hadoop” module. It would be > nice > >> to see how you’ve structured that in case we can use the same model to > >> solve the two remaining refactoring issues. > >> 1) external dependencies in the spark module > >> 2) no spark or h2o in the release artifacts. > >> > >> On Jan 23, 2015, at 6:45 PM, Shannon Quinn <squ...@gatech.edu> wrote: > >> > >> Also +1 > >> > >> iPhone'd > >> > >>> On Jan 23, 2015, at 18:38, Andrew Palumbo <ap....@outlook.com> wrote: > >>> > >>> +1 > >>> > >>> > >>> Sent from my Verizon Wireless 4G LTE smartphone > >>> > >>> <div>-------- Original message --------</div><div>From: Dmitriy > Lyubimov > >> <dlie...@gmail.com> </div><div>Date:01/23/2015 6:06 PM (GMT-05:00) > >> </div><div>To: dev@mahout.apache.org </div><div>Subject: Codebase > >> refactoring proposal </div><div> > >>> </div> > >>> So right now mahout-spark depends on mr-legacy. > >>> I did quick refactoring and it turns out it only _irrevocably_ depends > on > >>> the following classes there: > >>> > >>> MatrixWritable, VectorWritable, Varint/Varlong and VarintWritable, and > >> ... > >>> *sigh* o.a.m.common.Pair > >>> > >>> So I just dropped those five classes into new a new tiny mahout-hadoop > >>> module (to signify stuff that is directly relevant to serializing > thigns > >> to > >>> DFS API) and completely removed mrlegacy and its transients from spark > >> and > >>> spark-shell dependencies. > >>> > >>> So non-cli applications (shell scripts and embedded api use) actually > >> only > >>> need spark dependencies (which come from SPARK_HOME classpath, of > course) > >>> and mahout jars (mahout-spark, mahout-math(-scala), mahout-hadoop and > >>> optionally mahout-spark-shell (for running shell)). > >>> > >>> This of course still doesn't address driver problems that want to throw > >>> more stuff into front-end classpath (such as cli parser) but at least > it > >>> renders transitive luggage of mr-legacy (and the size of worker-shipped > >>> jars) much more tolerable. > >>> > >>> How does that sound? > >> > >> > > > > >