[
https://issues.apache.org/jira/browse/MAHOUT-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261805#comment-14261805
]
ASF GitHub Bot commented on MAHOUT-1636:
----------------------------------------
Github user pferrel commented on the pull request:
https://github.com/apache/mahout/pull/69#issuecomment-68418688
Why is there no lib dir in Mahout? If you get the distribution tar you get
a lib, if you build from source you don't.
Apparently whoever originally designed the build process thought that the
job jar was good enough since it included dependencies and was needed for
hadoop.
The snapshot distribution tar for Mahout has libs but none of the Spark or
H2O libs. There are some mahout jars but none of the DSL, spark and h2o jars
are missing there--so it's seriously broken.
Sooo, we need to:
1. enable the lib construction in build-from-source.
2. all the new modules must put their jars in the distribution tar
#1 will probably remove the need for another assembly. Where is our build
engineer? We wouldn't want a suboptimal solution since it's so close to optimal
as it is.
> Class dependencies for the spark module are put in a job.jar, which is very
> inefficient
> ---------------------------------------------------------------------------------------
>
> Key: MAHOUT-1636
> URL: https://issues.apache.org/jira/browse/MAHOUT-1636
> Project: Mahout
> Issue Type: Bug
> Components: spark
> Affects Versions: 1.0-snapshot
> Reporter: Pat Ferrel
> Assignee: Ted Dunning
> Fix For: 1.0-snapshot
>
>
> using a maven plugin and an assembly job.xml a job.jar is created with all
> dependencies including transitive ones. This job.jar is in
> mahout/spark/target and is included in the classpath when a Spark job is run.
> This allows dependency classes to be found at runtime but the job.jar include
> a great deal of things not needed that are duplicates of classes found in the
> main mrlegacy job.jar. If the job.jar is removed, drivers will not find
> needed classes. A better way needs to be implemented for including class
> dependencies.
> I'm not sure what that better way is so am leaving the assembly alone for
> now. Whoever picks up this Jira will have to remove it after deciding on a
> better method.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)