[
https://issues.apache.org/jira/browse/MAHOUT-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257385#comment-14257385
]
Dmitriy Lyubimov edited comment on MAHOUT-1636 at 12/23/14 7:22 PM:
--------------------------------------------------------------------
correction.
the spark shell classpath is printed by `mahout -spark classpath`.
here is for example spark shell front-end classpath:
{panel}
bin/mahout -spark classpath | sed 's/:/\n/g'
Running on hadoop, using /home/dmitriy/tools/hadoop/bin/hadoop and
HADOOP_CONF_DIR=
/home/dmitriy/projects/github/mahout/src/conf
/home/dmitriy/tools/java/lib/tools.jar
/home/dmitriy/projects/github/mahout/mahout-*.jar
/home/dmitriy/projects/github/mahout/math-scala/target/mahout-math-scala_2.10-1.0-SNAPSHOT.jar
/home/dmitriy/projects/github/mahout/math-scala/target/mahout-math-scala_2.10-1.0-SNAPSHOT-sources.jar
/home/dmitriy/projects/github/mahout/math-scala/target/mahout-math-scala_2.10-1.0-SNAPSHOT-tests.jar
/home/dmitriy/projects/github/mahout/mrlegacy/target/mahout-mrlegacy-1.0-SNAPSHOT.jar
/home/dmitriy/projects/github/mahout/mrlegacy/target/mahout-mrlegacy-1.0-SNAPSHOT-job.jar
/home/dmitriy/projects/github/mahout/mrlegacy/target/mahout-mrlegacy-1.0-SNAPSHOT-sources.jar
/home/dmitriy/projects/github/mahout/mrlegacy/target/mahout-mrlegacy-1.0-SNAPSHOT-tests.jar
/home/dmitriy/projects/github/mahout/math/target/mahout-math-1.0-SNAPSHOT.jar
/home/dmitriy/projects/github/mahout/math/target/mahout-math-1.0-SNAPSHOT-sources.jar
/home/dmitriy/projects/github/mahout/math/target/mahout-math-1.0-SNAPSHOT-tests.jar
/home/dmitriy/projects/github/mahout/spark/target/mahout-spark_2.10-1.0-SNAPSHOT.jar
/home/dmitriy/projects/github/mahout/spark/target/mahout-spark_2.10-1.0-SNAPSHOT-sources.jar
/home/dmitriy/projects/github/mahout/spark/target/mahout-spark_2.10-1.0-SNAPSHOT-tests.jar
/home/dmitriy/projects/github/mahout/spark-shell/target/mahout-spark-shell_2.10-1.0-SNAPSHOT.jar
/home/dmitriy/projects/github/mahout/spark-shell/target/mahout-spark-shell_2.10-1.0-SNAPSHOT-sources.jar
/home/dmitriy/projects/github/mahout/spark-shell/target/mahout-spark-shell_2.10-1.0-SNAPSHOT-tests.jar
/home/dmitriy/tools/spark/conf
/home/dmitriy/tools/spark/assembly/target/scala-2.10/spark-assembly-1.1.0-hadoop1.0.4.jar
/home/dmitriy/projects/github/mahout/lib/*.jar
{panel}
was (Author: dlyubimov):
correction.
the spark shell classpath is printed by `mahout -spark classpath`.
> Class dependencies for the spark module are put in a job.jar, which is very
> inefficient
> ---------------------------------------------------------------------------------------
>
> Key: MAHOUT-1636
> URL: https://issues.apache.org/jira/browse/MAHOUT-1636
> Project: Mahout
> Issue Type: Bug
> Components: spark
> Affects Versions: 1.0-snapshot
> Reporter: Pat Ferrel
> Fix For: 1.0-snapshot
>
>
> using a maven plugin and an assembly job.xml a job.jar is created with all
> dependencies including transitive ones. This job.jar is in
> mahout/spark/target and is included in the classpath when a Spark job is run.
> This allows dependency classes to be found at runtime but the job.jar include
> a great deal of things not needed that are duplicates of classes found in the
> main mrlegacy job.jar. If the job.jar is removed, drivers will not find
> needed classes. A better way needs to be implemented for including class
> dependencies.
> I'm not sure what that better way is so am leaving the assembly alone for
> now. Whoever picks up this Jira will have to remove it after deciding on a
> better method.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)