[
https://issues.apache.org/jira/browse/MAHOUT-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257637#comment-14257637
]
Dmitriy Lyubimov commented on MAHOUT-1636:
------------------------------------------
bq. The external things I use are scopt used by the frontend...Since the
frontend fails first not sure...
Then the first step is to deal with scopt. Which brings the question -- why
existing dependencies to parse command lines ( i guess such as commons-cli) do
not fulfill the objective?
if (and it is a very big if) scopt is needed, then yes perhaps there needs to
be a separate assembly pom (with type pom) that does nothing but copies the
dependencies (as separate jars!) under /lib . At least that's how it is
normally handled with maven. This needs to be thorougly filtered for spark deps
though.
i have another, less relevant question -- which module do "drivers" (i.e. cli)
go to? math or math-scala is definitely not a place for them; and spark is
spark-specific module. I.e. if you are placing drivers into "spark" module
then all they will ever be able to handle are Spark jobs. Well perhaps that's
the idea, dunno. Still though it brings a question of how CLI is to be handled
independently of backend.
> Class dependencies for the spark module are put in a job.jar, which is very
> inefficient
> ---------------------------------------------------------------------------------------
>
> Key: MAHOUT-1636
> URL: https://issues.apache.org/jira/browse/MAHOUT-1636
> Project: Mahout
> Issue Type: Bug
> Components: spark
> Affects Versions: 1.0-snapshot
> Reporter: Pat Ferrel
> Fix For: 1.0-snapshot
>
>
> using a maven plugin and an assembly job.xml a job.jar is created with all
> dependencies including transitive ones. This job.jar is in
> mahout/spark/target and is included in the classpath when a Spark job is run.
> This allows dependency classes to be found at runtime but the job.jar include
> a great deal of things not needed that are duplicates of classes found in the
> main mrlegacy job.jar. If the job.jar is removed, drivers will not find
> needed classes. A better way needs to be implemented for including class
> dependencies.
> I'm not sure what that better way is so am leaving the assembly alone for
> now. Whoever picks up this Jira will have to remove it after deciding on a
> better method.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)