Pat Ferrel created MAHOUT-1636:
----------------------------------
Summary: Class dependencies for the spark module are put in a
job.jar, which is very inefficient
Key: MAHOUT-1636
URL: https://issues.apache.org/jira/browse/MAHOUT-1636
Project: Mahout
Issue Type: Bug
Components: spark
Affects Versions: 1.0-snapshot
Reporter: Pat Ferrel
Fix For: 1.0-snapshot
using a maven plugin and an assembly job.xml a job.jar is created with all
dependencies including transitive ones. This job.jar is in mahout/spark/target
and is included in the classpath when a Spark job is run. This allows
dependency classes to be found at runtime but the job.jar include a great deal
of things not needed that are duplicates of classes found in the main mrlegacy
job.jar. If the job.jar is removed, drivers will not find needed classes. A
better way needs to be implemented for including class dependencies.
I'm not sure what that better way is so am leaving the assembly alone for now.
Whoever picks up this Jira will have to remove it after deciding on a better
method.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)