[ 
https://issues.apache.org/jira/browse/MAHOUT-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257385#comment-14257385
 ] 

Dmitriy Lyubimov edited comment on MAHOUT-1636 at 12/23/14 7:22 PM:
--------------------------------------------------------------------

correction. 

the spark shell classpath is printed by `mahout -spark classpath`.

here is for example spark shell front-end classpath: 
{panel}
bin/mahout -spark classpath | sed 's/:/\n/g'
Running on hadoop, using /home/dmitriy/tools/hadoop/bin/hadoop and 
HADOOP_CONF_DIR=

/home/dmitriy/projects/github/mahout/src/conf
/home/dmitriy/tools/java/lib/tools.jar
/home/dmitriy/projects/github/mahout/mahout-*.jar
/home/dmitriy/projects/github/mahout/math-scala/target/mahout-math-scala_2.10-1.0-SNAPSHOT.jar
/home/dmitriy/projects/github/mahout/math-scala/target/mahout-math-scala_2.10-1.0-SNAPSHOT-sources.jar
/home/dmitriy/projects/github/mahout/math-scala/target/mahout-math-scala_2.10-1.0-SNAPSHOT-tests.jar
/home/dmitriy/projects/github/mahout/mrlegacy/target/mahout-mrlegacy-1.0-SNAPSHOT.jar
/home/dmitriy/projects/github/mahout/mrlegacy/target/mahout-mrlegacy-1.0-SNAPSHOT-job.jar
/home/dmitriy/projects/github/mahout/mrlegacy/target/mahout-mrlegacy-1.0-SNAPSHOT-sources.jar
/home/dmitriy/projects/github/mahout/mrlegacy/target/mahout-mrlegacy-1.0-SNAPSHOT-tests.jar
/home/dmitriy/projects/github/mahout/math/target/mahout-math-1.0-SNAPSHOT.jar
/home/dmitriy/projects/github/mahout/math/target/mahout-math-1.0-SNAPSHOT-sources.jar
/home/dmitriy/projects/github/mahout/math/target/mahout-math-1.0-SNAPSHOT-tests.jar
/home/dmitriy/projects/github/mahout/spark/target/mahout-spark_2.10-1.0-SNAPSHOT.jar
/home/dmitriy/projects/github/mahout/spark/target/mahout-spark_2.10-1.0-SNAPSHOT-sources.jar
/home/dmitriy/projects/github/mahout/spark/target/mahout-spark_2.10-1.0-SNAPSHOT-tests.jar
/home/dmitriy/projects/github/mahout/spark-shell/target/mahout-spark-shell_2.10-1.0-SNAPSHOT.jar
/home/dmitriy/projects/github/mahout/spark-shell/target/mahout-spark-shell_2.10-1.0-SNAPSHOT-sources.jar
/home/dmitriy/projects/github/mahout/spark-shell/target/mahout-spark-shell_2.10-1.0-SNAPSHOT-tests.jar


/home/dmitriy/tools/spark/conf
/home/dmitriy/tools/spark/assembly/target/scala-2.10/spark-assembly-1.1.0-hadoop1.0.4.jar
/home/dmitriy/projects/github/mahout/lib/*.jar

{panel}


was (Author: dlyubimov):
correction. 

the spark shell classpath is printed by `mahout -spark classpath`.

> Class dependencies for the spark module are put in a job.jar, which is very 
> inefficient
> ---------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1636
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1636
>             Project: Mahout
>          Issue Type: Bug
>          Components: spark
>    Affects Versions: 1.0-snapshot
>            Reporter: Pat Ferrel
>             Fix For: 1.0-snapshot
>
>
> using a maven plugin and an assembly job.xml a job.jar is created with all 
> dependencies including transitive ones. This job.jar is in 
> mahout/spark/target and is included in the classpath when a Spark job is run. 
> This allows dependency classes to be found at runtime but the job.jar include 
> a great deal of things not needed that are duplicates of classes found in the 
> main mrlegacy job.jar.  If the job.jar is removed, drivers will not find 
> needed classes. A better way needs to be implemented for including class 
> dependencies.
> I'm not sure what that better way is so am leaving the assembly alone for 
> now. Whoever picks up this Jira will have to remove it after deciding on a 
> better method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to