Right now com.tdunning.math.stats.TDigest is being used by
OnlineSummarizer in mahout-math. And OnlineSummarizer is used in the
ResultAnalyzer port in math-scala. I guess the thing to do would be to
port OnlineSummarizer to math-scala and use the Streamlib
com.clearspring.analytics.stream.quantile.TDigest. From a quick look it
should be trivial. That way we could leave it out of the assembly.
As is ResultAnalyzer is currently used in the front-end only.
commons.math3 might be slightly more complicated to get rid of.
On 05/01/2015 02:37 PM, Dmitriy Lyubimov wrote:
I'd rather switch to using stream-lib for t-digest. It is much more widely
adopted distribution of that and is already part of spark dependencies, so
in case of spark job, it doesn't need to be packaged explicitly in the
backend classpath.
although depending on backend transitive jars may be a dangerous practice,
as we saw in case of guava. Nonetheless, for the sake of standardizing
things, i'd rather be depended on stream-lib than on a single-algorithm jar
with unclear support commitment.
On Fri, May 1, 2015 at 10:01 AM, Andrew Palumbo <[email protected]> wrote:
ResultAnalyzer is Also used in SparkNaiveBayes.test (...).
Sent from my Verizon Wireless 4G LTE smartphone
<div>-------- Original message --------</div><div>From: Andrew Palumbo <
[email protected]> </div><div>Date:05/01/2015 12:57 PM (GMT-05:00)
</div><div>To: [email protected] </div><div>Subject: RE:
dependency-reduced jar </div><div>
</div>
I added T-digest and math3. the CLI Naive Bayes driver needs them.
Specifically the ResultAnalyzer in TestNBDriver.
Sent from my Verizon Wireless 4G LTE smartphone
<div>-------- Original message --------</div><div>From: Suneel Marthi <
[email protected]> </div><div>Date:05/01/2015 12:14 PM
(GMT-05:00) </div><div>To: mahout <[email protected]>
</div><div>Subject: Re: dependency-reduced jar </div><div>
</div>T-digest is being used in Mahout-MR, I believe its also packaged as
part of
Spark -> AddThis jar.
On Fri, May 1, 2015 at 12:11 PM, Pat Ferrel <[email protected]> wrote:
There is an assembly xml in
mahout/spark/src/main/assembly/dependency-reduced.xml. It contains
dependencies that are external to mahout but required for either the
client
or backend executor distributed code.
Guava has recently been removed but scopt is still used by the client.
For
some reason the following artifacts were added to the assembly and I’m
not
sure why. This is only used with Spark.