T-digest is being used in Mahout-MR, I believe its also packaged as part of Spark -> AddThis jar.
On Fri, May 1, 2015 at 12:11 PM, Pat Ferrel <[email protected]> wrote: > There is an assembly xml in > mahout/spark/src/main/assembly/dependency-reduced.xml. It contains > dependencies that are external to mahout but required for either the client > or backend executor distributed code. > > Guava has recently been removed but scopt is still used by the client. For > some reason the following artifacts were added to the assembly and I’m not > sure why. This is only used with Spark. > > <includes> > <include>com.github.scopt</include> > <include>com.tdunning:t-digest</include> > <include>org.apache.commons:commons-math3</include> > </includes> > > Are these all used? Does anyone know where t-digest and math3 came from? > > I’d also like to propose that we create two jars, one for client and one > for backend executors. There are three configs we need to work in, spark > alone, yarn-cleint, and yarn-cluster. All these modes separate the needs of > the client from the backend executors but have slightly different ways to > get the classes needed for each. I think separating into client and backend > dependencies jars will cover all cases but we’ll have to explain how to > launch code in each mode.
