[ 
https://issues.apache.org/jira/browse/MAHOUT-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012553#comment-13012553
 ] 

Dmitriy Lyubimov commented on MAHOUT-622:
-----------------------------------------

bq. In fact there are a load of excludes there. I understand it's needed if you 
want to perhaps exclude some transitive dependency and override it, but, it 
seems like we're excluding a load of things that aren't 
relevant. Do they "hurt" enough that we have to keep up with what Hadoop 
includes that we don't use?

In short, at present, i think the answer is 'yes', we have to keep up with 
stripping stuff off that we don't use. But I think hadoop 0.21 has reworked 
this so we will have to revisit it then.

exclusions are stuff that we prune out of transitive tree. As far as i 
understand, the reason here is that we never class-load any files on any path 
either directly or thru hadoop use so we don't need them in our final assembly. 
That's also the risk (imagine hadoop changes and for some reasons client 
implementation initiates some classes to be loaded form KosmosFS or else. ) 

One of the better practices to handle that that i came on is have those 
dependencies optional (i.e. 'excluded by default'). Spring is a good example in 
this sense. It is compiled with a boatload of technologies and it's quite 
unlikely you'd use all of them. So they declare them optional but you still can 
inherit their versions (sometimes, not alwyas, as spring might use some maven 
work there too IMO) and if you need them, you just redeclare them without a 
version. 

But sometimes there's a problem in a sense that for the same project one might 
have 2 scopes -- one scope for 'api only' everything else excluded, and other 
scope is api + full implementation and then there's no way to separate them 
clearly under current maven model. So when you want some 'api' only then you 
incur tons of excludes (kinda like what we have). 

I think Hadoop community now inclines more and more toward practice where 
client (driver) code is compiled with api only (such as 0.20.2) and can find 
local hadoop client libraries at runtime and "throw" them into MR job 
automatically. E.g. Pig0.7.0 used to include all hadoop jars and hence would 
run only with a fixed version of hadoop but 0.8.0 doesn't have all those jars 
'onboard' and instead requires a local hadoop install to be used. In that 
sense, i think, it would make for hadoop to declare their api stuff 
transitively (necessary for driver only) and the rest of it 'optional'. We can 
look at Pig 0.8.0 to see what they do with hadoop deps in that regard (although 
i am not sure whether they actually have a maven build though -- they might 
have a regular ant, i can't remember). 

> Mahout dependencies are unified under dependency management in parent pom
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-622
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-622
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.4
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>            Priority: Minor
>              Labels: build, maven, pom
>             Fix For: 0.5
>
>         Attachments: MAHOUT-622.patch, MAHOUT-622.patch
>
>
> As far as I understand, Maven encourages "best practice" of unified view of 
> dependency versions specified under <dependencyManagement> usually in a 
> parent pom, instead of under <dependencies>. 
> In Mahout, this practice is only partially followed. Some dependencies have 
> concrete versions under <dependencies> tag in submodule poms. Proposed change 
> is to raid those and move version declarations into parent pom. 
> This (as far as i understand) achieves 2 things: 
> * Mahout assembly would include same versions for all modules thus ensuring 
> runtime module dependencies are the same as compile time;
> * Somebody who uses Mahout as a dependency, could import Mahout dependencies 
> using <scope>import</scope> spec thus inheriting Mahout's versions for shared 
> dependencies.  
> For most part the change would be nominal although in certain cases we'd need 
> to sort out through cross-module conflicts (if any). Commons-math was one, 
> not sure if there are more. If there are none, the changes would be rather 
> mechanistic. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to