[ 
https://issues.apache.org/jira/browse/MAHOUT-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261263#comment-14261263
 ] 

ASF GitHub Bot commented on MAHOUT-1636:
----------------------------------------

Github user pferrel commented on the pull request:

    https://github.com/apache/mahout/pull/69#issuecomment-68375633
  
    The minimum piece to push is in this PR. So it will go in soon but the PR 
will remain active until several issues are addressed:
    
    1. can we change the scan for jars to only find the dependencies.jar for 
any spark driver or perhaps even the shell. I think this should work but it 
will have to be rather well tested in a running system since missing class 
errors are not detected by unit tests.
    2. with any dependencies.jar we have the issue of what artifacts to 
publish. In theory the dependencies are everything needed for Spark Mahout. So 
if #1 proves viable perhaps the dependencies.jar could be renamed and treated 
as the release artifact for Spark flavored of Mahout. This is a bit beyond my 
understanding so it's important to get someone who understands the total 
release process to look at the question.


> Class dependencies for the spark module are put in a job.jar, which is very 
> inefficient
> ---------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1636
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1636
>             Project: Mahout
>          Issue Type: Bug
>          Components: spark
>    Affects Versions: 1.0-snapshot
>            Reporter: Pat Ferrel
>            Assignee: Ted Dunning
>             Fix For: 1.0-snapshot
>
>
> using a maven plugin and an assembly job.xml a job.jar is created with all 
> dependencies including transitive ones. This job.jar is in 
> mahout/spark/target and is included in the classpath when a Spark job is run. 
> This allows dependency classes to be found at runtime but the job.jar include 
> a great deal of things not needed that are duplicates of classes found in the 
> main mrlegacy job.jar.  If the job.jar is removed, drivers will not find 
> needed classes. A better way needs to be implemented for including class 
> dependencies.
> I'm not sure what that better way is so am leaving the assembly alone for 
> now. Whoever picks up this Jira will have to remove it after deciding on a 
> better method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to