[ 
https://issues.apache.org/jira/browse/MAHOUT-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261493#comment-14261493
 ] 

ASF GitHub Bot commented on MAHOUT-1636:
----------------------------------------

Github user dlyubimov commented on the pull request:

    https://github.com/apache/mahout/pull/69#issuecomment-68398253
  
    also please use dependency opt-in, not opt-out while doing this.
    
    If you are saying you don't know how to do it then I'd say it'll have to
    wait until somebody knows and who can push code in. This is standard
    practice.
    
    Not knowing how to do things is not a good reason to get a suboptimal
    solution in.
    
    On Tue, Dec 30, 2014 at 1:02 PM, Dmitriy Lyubimov <[email protected]> wrote:
    
    > assembly should put all jars _separately_ in a predefined path such as
    > MAHOUT$HOME/lib.
    >
    > On Tue, Dec 30, 2014 at 12:31 PM, Pat Ferrel <[email protected]>
    > wrote:
    >
    >> Hmm, not using shade and not doing any of the more funky things it
    >> supports. I guess you are talking about creating a trimmed down all-deps
    >> jar (using the assembly maven plugin)?
    >>
    >> Are you asking to exclude Mahout too? That would make the jar nothing
    >> more than lib-managed, right? It would be quite easy to do. The jar would
    >> still need to be a release artifact. And we would still have a huge list 
of
    >> jars to search in "mahout classpath -spark".
    >>
    >> I'm *not* a build engineer so if someone has a better way of doing this
    >> please speak up.
    >>
    >> —
    >> Reply to this email directly or view it on GitHub
    >> <https://github.com/apache/mahout/pull/69#issuecomment-68395014>.
    >>
    >
    >


> Class dependencies for the spark module are put in a job.jar, which is very 
> inefficient
> ---------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1636
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1636
>             Project: Mahout
>          Issue Type: Bug
>          Components: spark
>    Affects Versions: 1.0-snapshot
>            Reporter: Pat Ferrel
>            Assignee: Ted Dunning
>             Fix For: 1.0-snapshot
>
>
> using a maven plugin and an assembly job.xml a job.jar is created with all 
> dependencies including transitive ones. This job.jar is in 
> mahout/spark/target and is included in the classpath when a Spark job is run. 
> This allows dependency classes to be found at runtime but the job.jar include 
> a great deal of things not needed that are duplicates of classes found in the 
> main mrlegacy job.jar.  If the job.jar is removed, drivers will not find 
> needed classes. A better way needs to be implemented for including class 
> dependencies.
> I'm not sure what that better way is so am leaving the assembly alone for 
> now. Whoever picks up this Jira will have to remove it after deciding on a 
> better method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to