T-digest is being used in Mahout-MR, I believe its also packaged as part of
Spark -> AddThis jar.

On Fri, May 1, 2015 at 12:11 PM, Pat Ferrel <[email protected]> wrote:

> There is an assembly xml in
> mahout/spark/src/main/assembly/dependency-reduced.xml. It contains
> dependencies that are external to mahout but required for either the client
> or backend executor distributed code.
>
> Guava has recently been removed but scopt is still used by the client. For
> some reason the following artifacts were added to the assembly and I’m not
> sure why. This is only used with Spark.
>
> <includes>
>   <include>com.github.scopt</include>
>   <include>com.tdunning:t-digest</include>
>   <include>org.apache.commons:commons-math3</include>
> </includes>
>
> Are these all used? Does anyone know where t-digest and math3 came from?
>
> I’d also like to propose that we create two jars, one for client and one
> for backend executors. There are three configs we need to work in, spark
> alone, yarn-cleint, and yarn-cluster. All these modes separate the needs of
> the client from the backend executors but have slightly different ways to
> get the classes needed for each. I think separating into client and backend
> dependencies jars will cover all cases but we’ll have to explain how to
> launch code in each mode.

Reply via email to