Re: Moving to Maven (HBASE-2099)

Kay Kay Sat, 13 Feb 2010 03:03:09 -0800

Given that both ivy-maven-stuff address the issue of dependencymanagement, I can broadly think of 2 reasons (the producer and theconsumer) and other advantages coming under them one way or another.

* The maintainer of the project would be responsible for handlingdependencies and upgrading them on a need basis.Just like any other encapsulation we talk about, the childrendependencies (level 2 of the dependency tree) of the primarydependencies that we are concerned with, are entirely hidden from ussince that would be the job of the corresponding package owner toidentify and list the same (pom.xml / ivy.xml , as appropriate) , andnot that of the user of the package.

With appropriate test suites in place, it would be easy to flip /test-drive a new version before reverting back / upgrading , especiallywhen a project publishes more than 1 artifact.

* As a consumer of the project, if a maintainer puts the artifactsalong with the dependencies, that makes it easy to assemble the blockswith only the primary dependencies listed ( and not worrying about otherlibraries, that might be needed , encountering nasty ClassNotFoundErrorsin runtime).

I agree that both of them sound very theoretical and ideal . The casewhere it works straight-forward is when you are setting up a project -you can get up and running with minimal bullet points that you know as adependency of your project , without trying to gather the entire listneeded for the same, behind the scenes. Since with mvn / ivy , that isalready defined by the package maintainer and best to use that.

This might not make sense for hadoop, but for a much larger codebase, asconsumers of the same - it would be very useful to have these blocks'blessed' by the original contributors / maintainers - so they can keepup / test drive new versions before actually making a decision one wayor another, as opposed to ( remove all old versions and download and addnew versions in ./lib directory ).

Recently , we had an upgradation of the thrift library and during theprocess -there were some discussions about some of the guts of thriftcode, using commons-lang for a hashcode implementation (or s.th ,similar to that). While that was definitely informative, as users of thelibrary - it is something that should have been transparent , if theprocess was already existing as opposed to getting into the guts of it.

On the other end of the spectrum - currently , if anybody is planning toget started with just the client framework of HBase to communicate withthe eco-system ( zk , master, r.s etc. ) , they have to get every otherdependency listed at hbase (not every body has the time/ resources toplay around with sources and figure out the subset and restrict it) .Assuming the server setup is all complete and clients living in adifferent machine - all that is needed is a scaled down version ofclient library (and not the giant list), that knows the ipc semantics ,without worrying about the server internals. So - on the publicationside- once ivy/maven-ization is complete , hbase can start publishingdifferent artifacts that could be used , depending on the needs.

A great candidate for such a case would be mahout project, that useshbase for one of its algorithm implementations, and there is nonecessity for a giant load of hbase on them, say.

Specifically for hdfs (and hadoop projects, in general) - I candefinitely see the frustration coming from , due to the necessity tokeep up with snapshots that is in a state of flux / transformation.That in turn slows down the build time. Ironically - that would be theright way to keep up with the dependencies and as consumers of otherprojects - we can be less aggressive about how frequently we want tokeep up with upstream projects, while having an easy option to try outnew ones as they become available.

As we see - it is just another eco-system, that can thrive well - wheneverybody plays by the rules. Bad / inconsistent pom-s, ivy-s definitelydo exist , as I discovered (on some of the projects of hadoop-xyz).While that can be extremely frustrating - assuming the community is opento receive patches / address them - will help thrive the eco-systembetter. Viewing the dependency graph of a given project with otherstakeholders in the same room, (applicable within a 'shop' as opposed to'oss' projects ) , will bring forth 'ball-of-mud' / 'code duplication'design patterns by using some graph theory 101 principles and give anidea , when people are starting to refactor existing code. ( Ideally itshould be a dependency tree, but for all practical purposes - it becomesa dependency graph for a variety of reasons, some of them non-technicalas anybody can guess ! ).

As Mathias had pointed out - while maven / ivy solve the dependencymanagement problem, maven's rigid rules sometime make the build processgo crazy (especially when we try to transfer from non-dependencymanagement world with custom scripts / unconventional 'target'lifecycles ). I am +1-on the maven part of HBase since I do not see theneed for such a process , if we keep the scope of hbase-core restrictedto what it does today, with other plugins to hbase, appearing asseparate libraries / apps as opposed to overloading the codebase withmore responsibilities. The detailed comparison between maven and ivywarrant a separate post altogether, so am not going to continue on that.

I have listed them , from my own experiences and do not reflect theofficial stance behind 'ivy'-ization of hadoop / even hbase , for thatmatter. In reality - for the longevity of code base, such 'blessed'eco-system helps everyone in it.


--
  K K.






On 02/13/2010 01:51 AM, Dhruba Borthakur wrote:

My personal experience is that the ivy-maven-stuff introduced into the
Hadoop build system has tremendously slowed down the Hadoop build process. I
am sure that this disadvantage is offet-ed by some advantages that I am not
aware of. If you could educate me on the top two advantages that accrued to
Hadoop after moving to the new build process, that would be awesome.

thanks a bunch,
dhruba


On Sat, Feb 13, 2010 at 1:44 AM, Kay Kay<kaykay.uni...@gmail.com>  wrote:

On 02/13/2010 01:29 AM, Dhruba Borthakur wrote:

   From what I understand, the slowness of 'ivy' can be reduced if you can
fetch dependent jars from local ivy server, isn't it?

The problem discussed is an artifact of hbase , trying to keep up with the
most recent snapshots of hadoop-core/ hdfs / mapred and hence the ivy
resolution is expensive that every time it hits the mvn repository to check
for the latest snapshot , if any.  So the slowness is due to the necessity
to keep up with the dependencies to identify issues early in the cycle.
Specifically this can be attributed to the - changing="true" in all the
ivy.xml-s in hbase, for hadoop artifacts . I am looking to making it a
configurable option to avoid expensive build time.


This will not be an issue if this were a hbase release, depending on other
releases of hadoop-core / mapred / common etc.
Both ivy and maven does cache the artifacts locally making the roundtrip
redundant (except for the first time, of course), so this should not be an
issue for people trying to build the release from sources, since it should
be moot by then.





  thanks,

dhruba

On Sat, Feb 13, 2010 at 12:25 AM, Kay Kay<kaykay.uni...@gmail.com>
  wrote:

Mathias -
   I have been using Ivy / Maven , interchangeably in different projects
for
the build management.  Both of them clearly have their strong points and
drawbacks.  Ivy fits great for thrift because of the nature of tasks ,
involved using some external command-line (thrift generators) etc.  As I
mentioned before - HBase does not have such cross maven goals / between
the
hairs as the build lifecycle is pretty straight-forward.
  In any case - the intention is to get to publish HBase artifacts and
maintain a smaller core and encouraging contribs. from the artifacts as
opposed to getting into the codebase.
Once there are HBase artifacts published , the contrib / plugins for the
same would be free to use ivy (with m2compatible="true") / maven as
appropriate.

Ryan -
   The slowness is attributed to the 'changing="true" ' in ivy.xml-s for
all
the hadoop-common / -hdfs / -mapreduce snapshots that we are using. I am
facing similar 'slowness' with other mvn hadoop (snapshot) dependencies
as
well. In retrospective, that should have been made a configurable flag in
libraries.properties , to ease things. Hopefully that is sorted out soon.




On 02/13/2010 12:10 AM, Ryan Rawson wrote:

Would you mind elaborating more?  At the moment, most people do not
build hbase, and the POM/jar/publishing thing is orthogonal - those
who wish to build their own projects with ivy and/or ant are free to
do so and not be impacted by our use of maven.

We have ivy, but it doesnt integrate with our IDEs and is rather slow
to build and rebuild.

On Sat, Feb 13, 2010 at 12:03 AM, Mathias Herberts
<mathias.herbe...@gmail.com>    wrote:

-1

I think Maven is too complex and will lower the adoption of HBase by
people today willing to build it.

I would suggest using Ivy for dependency management as was done in
Thrift.

Mathias.

Re: Moving to Maven (HBASE-2099)

Reply via email to