[
https://issues.apache.org/jira/browse/MAHOUT-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012553#comment-13012553
]
Dmitriy Lyubimov commented on MAHOUT-622:
-----------------------------------------
bq. In fact there are a load of excludes there. I understand it's needed if you
want to perhaps exclude some transitive dependency and override it, but, it
seems like we're excluding a load of things that aren't
relevant. Do they "hurt" enough that we have to keep up with what Hadoop
includes that we don't use?
In short, at present, i think the answer is 'yes', we have to keep up with
stripping stuff off that we don't use. But I think hadoop 0.21 has reworked
this so we will have to revisit it then.
exclusions are stuff that we prune out of transitive tree. As far as i
understand, the reason here is that we never class-load any files on any path
either directly or thru hadoop use so we don't need them in our final assembly.
That's also the risk (imagine hadoop changes and for some reasons client
implementation initiates some classes to be loaded form KosmosFS or else. )
One of the better practices to handle that that i came on is have those
dependencies optional (i.e. 'excluded by default'). Spring is a good example in
this sense. It is compiled with a boatload of technologies and it's quite
unlikely you'd use all of them. So they declare them optional but you still can
inherit their versions (sometimes, not alwyas, as spring might use some maven
work there too IMO) and if you need them, you just redeclare them without a
version.
But sometimes there's a problem in a sense that for the same project one might
have 2 scopes -- one scope for 'api only' everything else excluded, and other
scope is api + full implementation and then there's no way to separate them
clearly under current maven model. So when you want some 'api' only then you
incur tons of excludes (kinda like what we have).
I think Hadoop community now inclines more and more toward practice where
client (driver) code is compiled with api only (such as 0.20.2) and can find
local hadoop client libraries at runtime and "throw" them into MR job
automatically. E.g. Pig0.7.0 used to include all hadoop jars and hence would
run only with a fixed version of hadoop but 0.8.0 doesn't have all those jars
'onboard' and instead requires a local hadoop install to be used. In that
sense, i think, it would make for hadoop to declare their api stuff
transitively (necessary for driver only) and the rest of it 'optional'. We can
look at Pig 0.8.0 to see what they do with hadoop deps in that regard (although
i am not sure whether they actually have a maven build though -- they might
have a regular ant, i can't remember).
> Mahout dependencies are unified under dependency management in parent pom
> -------------------------------------------------------------------------
>
> Key: MAHOUT-622
> URL: https://issues.apache.org/jira/browse/MAHOUT-622
> Project: Mahout
> Issue Type: Improvement
> Affects Versions: 0.4
> Reporter: Dmitriy Lyubimov
> Assignee: Dmitriy Lyubimov
> Priority: Minor
> Labels: build, maven, pom
> Fix For: 0.5
>
> Attachments: MAHOUT-622.patch, MAHOUT-622.patch
>
>
> As far as I understand, Maven encourages "best practice" of unified view of
> dependency versions specified under <dependencyManagement> usually in a
> parent pom, instead of under <dependencies>.
> In Mahout, this practice is only partially followed. Some dependencies have
> concrete versions under <dependencies> tag in submodule poms. Proposed change
> is to raid those and move version declarations into parent pom.
> This (as far as i understand) achieves 2 things:
> * Mahout assembly would include same versions for all modules thus ensuring
> runtime module dependencies are the same as compile time;
> * Somebody who uses Mahout as a dependency, could import Mahout dependencies
> using <scope>import</scope> spec thus inheriting Mahout's versions for shared
> dependencies.
> For most part the change would be nominal although in certain cases we'd need
> to sort out through cross-module conflicts (if any). Commons-math was one,
> not sure if there are more. If there are none, the changes would be rather
> mechanistic.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira