Given that both ivy-maven-stuff address the issue of dependency
management, I can broadly think of 2 reasons (the producer and the
consumer) and other advantages coming under them one way or another.
* The maintainer of the project would be responsible for handling
dependencies and upgrading them on a need basis.
Just like any other encapsulation we talk about, the children
dependencies (level 2 of the dependency tree) of the primary
dependencies that we are concerned with, are entirely hidden from us
since that would be the job of the corresponding package owner to
identify and list the same (pom.xml / ivy.xml , as appropriate) , and
not that of the user of the package.
With appropriate test suites in place, it would be easy to flip /
test-drive a new version before reverting back / upgrading , especially
when a project publishes more than 1 artifact.
* As a consumer of the project, if a maintainer puts the artifacts
along with the dependencies, that makes it easy to assemble the blocks
with only the primary dependencies listed ( and not worrying about other
libraries, that might be needed , encountering nasty ClassNotFoundErrors
in runtime).
I agree that both of them sound very theoretical and ideal . The case
where it works straight-forward is when you are setting up a project -
you can get up and running with minimal bullet points that you know as a
dependency of your project , without trying to gather the entire list
needed for the same, behind the scenes. Since with mvn / ivy , that is
already defined by the package maintainer and best to use that.
This might not make sense for hadoop, but for a much larger codebase, as
consumers of the same - it would be very useful to have these blocks
'blessed' by the original contributors / maintainers - so they can keep
up / test drive new versions before actually making a decision one way
or another, as opposed to ( remove all old versions and download and add
new versions in ./lib directory ).
Recently , we had an upgradation of the thrift library and during the
process -there were some discussions about some of the guts of thrift
code, using commons-lang for a hashcode implementation (or s.th ,
similar to that). While that was definitely informative, as users of the
library - it is something that should have been transparent , if the
process was already existing as opposed to getting into the guts of it.
On the other end of the spectrum - currently , if anybody is planning to
get started with just the client framework of HBase to communicate with
the eco-system ( zk , master, r.s etc. ) , they have to get every other
dependency listed at hbase (not every body has the time/ resources to
play around with sources and figure out the subset and restrict it) .
Assuming the server setup is all complete and clients living in a
different machine - all that is needed is a scaled down version of
client library (and not the giant list), that knows the ipc semantics ,
without worrying about the server internals. So - on the publication
side- once ivy/maven-ization is complete , hbase can start publishing
different artifacts that could be used , depending on the needs.
A great candidate for such a case would be mahout project, that uses
hbase for one of its algorithm implementations, and there is no
necessity for a giant load of hbase on them, say.
Specifically for hdfs (and hadoop projects, in general) - I can
definitely see the frustration coming from , due to the necessity to
keep up with snapshots that is in a state of flux / transformation.
That in turn slows down the build time. Ironically - that would be the
right way to keep up with the dependencies and as consumers of other
projects - we can be less aggressive about how frequently we want to
keep up with upstream projects, while having an easy option to try out
new ones as they become available.
As we see - it is just another eco-system, that can thrive well - when
everybody plays by the rules. Bad / inconsistent pom-s, ivy-s definitely
do exist , as I discovered (on some of the projects of hadoop-xyz).
While that can be extremely frustrating - assuming the community is open
to receive patches / address them - will help thrive the eco-system
better. Viewing the dependency graph of a given project with other
stakeholders in the same room, (applicable within a 'shop' as opposed to
'oss' projects ) , will bring forth 'ball-of-mud' / 'code duplication'
design patterns by using some graph theory 101 principles and give an
idea , when people are starting to refactor existing code. ( Ideally it
should be a dependency tree, but for all practical purposes - it becomes
a dependency graph for a variety of reasons, some of them non-technical
as anybody can guess ! ).
As Mathias had pointed out - while maven / ivy solve the dependency
management problem, maven's rigid rules sometime make the build process
go crazy (especially when we try to transfer from non-dependency
management world with custom scripts / unconventional 'target'
lifecycles ). I am +1-on the maven part of HBase since I do not see the
need for such a process , if we keep the scope of hbase-core restricted
to what it does today, with other plugins to hbase, appearing as
separate libraries / apps as opposed to overloading the codebase with
more responsibilities. The detailed comparison between maven and ivy
warrant a separate post altogether, so am not going to continue on that.
I have listed them , from my own experiences and do not reflect the
official stance behind 'ivy'-ization of hadoop / even hbase , for that
matter. In reality - for the longevity of code base, such 'blessed'
eco-system helps everyone in it.
--
K K.
On 02/13/2010 01:51 AM, Dhruba Borthakur wrote:
My personal experience is that the ivy-maven-stuff introduced into the
Hadoop build system has tremendously slowed down the Hadoop build process. I
am sure that this disadvantage is offet-ed by some advantages that I am not
aware of. If you could educate me on the top two advantages that accrued to
Hadoop after moving to the new build process, that would be awesome.
thanks a bunch,
dhruba
On Sat, Feb 13, 2010 at 1:44 AM, Kay Kay<kaykay.uni...@gmail.com> wrote:
On 02/13/2010 01:29 AM, Dhruba Borthakur wrote:
From what I understand, the slowness of 'ivy' can be reduced if you can
fetch dependent jars from local ivy server, isn't it?
The problem discussed is an artifact of hbase , trying to keep up with the
most recent snapshots of hadoop-core/ hdfs / mapred and hence the ivy
resolution is expensive that every time it hits the mvn repository to check
for the latest snapshot , if any. So the slowness is due to the necessity
to keep up with the dependencies to identify issues early in the cycle.
Specifically this can be attributed to the - changing="true" in all the
ivy.xml-s in hbase, for hadoop artifacts . I am looking to making it a
configurable option to avoid expensive build time.
This will not be an issue if this were a hbase release, depending on other
releases of hadoop-core / mapred / common etc.
Both ivy and maven does cache the artifacts locally making the roundtrip
redundant (except for the first time, of course), so this should not be an
issue for people trying to build the release from sources, since it should
be moot by then.
thanks,
dhruba
On Sat, Feb 13, 2010 at 12:25 AM, Kay Kay<kaykay.uni...@gmail.com>
wrote:
Mathias -
I have been using Ivy / Maven , interchangeably in different projects
for
the build management. Both of them clearly have their strong points and
drawbacks. Ivy fits great for thrift because of the nature of tasks ,
involved using some external command-line (thrift generators) etc. As I
mentioned before - HBase does not have such cross maven goals / between
the
hairs as the build lifecycle is pretty straight-forward.
In any case - the intention is to get to publish HBase artifacts and
maintain a smaller core and encouraging contribs. from the artifacts as
opposed to getting into the codebase.
Once there are HBase artifacts published , the contrib / plugins for the
same would be free to use ivy (with m2compatible="true") / maven as
appropriate.
Ryan -
The slowness is attributed to the 'changing="true" ' in ivy.xml-s for
all
the hadoop-common / -hdfs / -mapreduce snapshots that we are using. I am
facing similar 'slowness' with other mvn hadoop (snapshot) dependencies
as
well. In retrospective, that should have been made a configurable flag in
libraries.properties , to ease things. Hopefully that is sorted out soon.
On 02/13/2010 12:10 AM, Ryan Rawson wrote:
Would you mind elaborating more? At the moment, most people do not
build hbase, and the POM/jar/publishing thing is orthogonal - those
who wish to build their own projects with ivy and/or ant are free to
do so and not be impacted by our use of maven.
We have ivy, but it doesnt integrate with our IDEs and is rather slow
to build and rebuild.
On Sat, Feb 13, 2010 at 12:03 AM, Mathias Herberts
<mathias.herbe...@gmail.com> wrote:
-1
I think Maven is too complex and will lower the adoption of HBase by
people today willing to build it.
I would suggest using Ivy for dependency management as was done in
Thrift.
Mathias.