[ 
https://issues.apache.org/jira/browse/HADOOP-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067399#comment-13067399
 ] 

Alejandro Abdelnur commented on HADOOP-6671:
--------------------------------------------

@Eric,

First of all, thanks for volunteering to tackle the Mavenization RPM/DEB.

My initial approach to the patch was heavily based in profiles doing what you 
are suggesting. The end result was a very large POM for 'common' with profiles 
heavily relying in the order of the plugins to do the right thing (I had to 
define all the plugins, even if not used, in the main <build> to ensure the 
right order of execution when the profiles are active). The result was a POM 
difficult to follow and to update (got bitten a few times while improving it).

My second approach, the current one, it is much cleaner in that regard. It 
fully leverages Maven reactor and build times are not affected. Following is a 
table that shows the time taken by common build tasks:


|| Build task            || Ant command                                || Maven 
command                                           || Ant Time || Maven time ||
| *clean*                | ant clean                                   |  mvn 
clean                                               |  00:02    |  00:01  *   |
| *clean compile*        | ant clean compile                           |  mvn 
clean compile                                       |  00:20    |  00:13  *   |
| *clean test-compile*   | ant clean test-compile                      |  mvn 
clean test -DskipTests                              |  00:23    |  00:17  *   |
| *clean 1 test*         | ant clean test -Dtestcase=TestConfiguration |  mvn 
clean test -Dtest=TestConfiguration                 |  01:09    |  *00:27* *  |
| *<warm> 1 test*        | ant test -Dtestcase=TestConfiguration       |  mvn 
test -Dtest=TestConfiguration                       |  00:52    |  *00:11* *  |
| *clean jar test-jar*   | ant clean jar jar-test                      |  mvn 
clean package                                       |  00:28    |  00:23  *   |
| *clean binary-tar*     | ant clean binary                            |  mvn 
clean package post-site -DskipTests                 |  00:59    |  00:46  *   |
| *clean tar w/docs*     | ant clean tar                               |  mvn 
clean package post-site -DskipTests -Pdocs          |  N/A      |  04:10      |
| *clean tar w/docs/src* | mvn clean tar                               |  mvn 
clean package post-site -DskipTests -Pdocs -Psource |  01:34 *  |  05:18      |

Of all these, IMO the *most* interesting improvement is running a single test 
(from scratch and with pre-compiled classes). This will be a huge improvement 
for development.

Said this, we could merge {{hadoop-docs}} in {{hadoop-common}}, using the 
'site' phase to wire all documentation generation (I think this wouldn't 
complicate things too much).

However, for TAR/RPM/DEB I would like to keep a different module which kicks 
with the assembly plugin to generate the TAR/RMP/DEB. And there we could have a 
profiles that build a TAR, a RMP and/or a DEB.

Another benefit of this is that all scripts and stuff would end up in the 
TAR/RMP/DEB module, the {{hadoop-common}} module only produces a JAR file.

The layout would then be:

{code}
trunk/pom.xml
|
|-- hadoop-annotations/pom.xml (javadoc annotations and doclet)
|
|-- hadoop-project/pom.xml (dependency management, extended by all other 
modules))
|
|-- common/pom.xml
|      |
|      |-- hadoop-common/pom.xml [clean, compile,package,install,deploy,site] 
(-Pnative)
|      |
|      |-- hadoop-common-distro/pom.xml [clean, assembly:single] (-Ptar -Prpm 
-Pdeb)
|
|-- hdfs
|
|-- mapreduce
{code}

The [...] are the meaningful lifecycle phases.

The (-P...) are the profiles each module would support.

The only thing we have to sort out is how to wire the maven-antrun-plugin to 
run after the 'assembly:single' invocation. This is required to be able to 
create invoke Unix TAR to create the TAR in order to preserve the symlinks.

Would you be OK with this approach? 

Thoughts?

PS: I'm somehow familiar with Hbase packaging and the current overloading of 
maven phases and profiles usages makings things too slow (until not long ago, 
not sure if still valid, running 'mvn install' was generating the TAR).


> To use maven for hadoop common builds
> -------------------------------------
>
>                 Key: HADOOP-6671
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6671
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Giridharan Kesavan
>            Assignee: Alejandro Abdelnur
>         Attachments: HADOOP-6671-cross-project-HDFS.patch, 
> HADOOP-6671-e.patch, HADOOP-6671-f.patch, HADOOP-6671-g.patch, 
> HADOOP-6671-h.patch, HADOOP-6671-i.patch, HADOOP-6671-j.patch, 
> HADOOP-6671-k.sh, HADOOP-6671-l.patch, HADOOP-6671-m.patch, 
> HADOOP-6671-n.patch, HADOOP-6671-o.patch, HADOOP-6671-p.patch, 
> HADOOP-6671-q.patch, HADOOP-6671.patch, HADOOP-6671b.patch, 
> HADOOP-6671c.patch, HADOOP-6671d.patch, build.png, common-mvn-layout-i.sh, 
> hadoop-commons-maven.patch, mvn-layout-e.sh, mvn-layout-f.sh, 
> mvn-layout-k.sh, mvn-layout-l.sh, mvn-layout-m.sh, mvn-layout-n.sh, 
> mvn-layout-o.sh, mvn-layout-p.sh, mvn-layout-q.sh, mvn-layout.sh, 
> mvn-layout.sh, mvn-layout2.sh, mvn-layout2.sh
>
>
> We are now able to publish hadoop artifacts to the maven repo successfully [ 
> Hadoop-6382]
> Drawbacks with the current approach:
> * Use ivy for dependency management with ivy.xml
> * Use maven-ant-task for artifact publishing to the maven repository
> * pom files are not generated dynamically 
> To address this I propose we use maven to build hadoop-common, which would 
> help us to manage dependencies, publish artifacts and have one single xml 
> file(POM) for dependency management and artifact publishing.
> I would like to have a branch created to work on mavenizing  hadoop common.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to