There is an assembly xml in 
mahout/spark/src/main/assembly/dependency-reduced.xml. It contains dependencies 
that are external to mahout but required for either the client or backend 
executor distributed code.

Guava has recently been removed but scopt is still used by the client. For some 
reason the following artifacts were added to the assembly and I’m not sure why. 
This is only used with Spark.

<includes>
  <include>com.github.scopt</include>
  <include>com.tdunning:t-digest</include>
  <include>org.apache.commons:commons-math3</include>
</includes>

Are these all used? Does anyone know where t-digest and math3 came from?

I’d also like to propose that we create two jars, one for client and one for 
backend executors. There are three configs we need to work in, spark alone, 
yarn-cleint, and yarn-cluster. All these modes separate the needs of the client 
from the backend executors but have slightly different ways to get the classes 
needed for each. I think separating into client and backend dependencies jars 
will cover all cases but we’ll have to explain how to launch code in each mode.

Reply via email to