[
https://issues.apache.org/jira/browse/MAHOUT-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257619#comment-14257619
]
Pat Ferrel commented on MAHOUT-1636:
------------------------------------
Drivers use mahoutSparkContext, which does the jar scan by invoking `mahout
-spark classpath`
The external things I use are scopt used by the frontend and guava used by the
backend. Since the frontend fails first not sure if there are backend failures
but I think guava is in the mrlegacy job jar and so it may not fail. Obviously
this failure isn't discovered by unit tests.
So your idea is to create mahout/lib with transitive deps then include those in
the scan? This would avoid any duplication that would happen in the job jars.
BTW this train of thought implies that the DSL in the shell won't work properly
without the current mrlegacy job jar. The DSL does depend on mrlegacy, right?
> Class dependencies for the spark module are put in a job.jar, which is very
> inefficient
> ---------------------------------------------------------------------------------------
>
> Key: MAHOUT-1636
> URL: https://issues.apache.org/jira/browse/MAHOUT-1636
> Project: Mahout
> Issue Type: Bug
> Components: spark
> Affects Versions: 1.0-snapshot
> Reporter: Pat Ferrel
> Fix For: 1.0-snapshot
>
>
> using a maven plugin and an assembly job.xml a job.jar is created with all
> dependencies including transitive ones. This job.jar is in
> mahout/spark/target and is included in the classpath when a Spark job is run.
> This allows dependency classes to be found at runtime but the job.jar include
> a great deal of things not needed that are duplicates of classes found in the
> main mrlegacy job.jar. If the job.jar is removed, drivers will not find
> needed classes. A better way needs to be implemented for including class
> dependencies.
> I'm not sure what that better way is so am leaving the assembly alone for
> now. Whoever picks up this Jira will have to remove it after deciding on a
> better method.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)