[
https://issues.apache.org/jira/browse/HADOOP-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067399#comment-13067399
]
Alejandro Abdelnur commented on HADOOP-6671:
--------------------------------------------
@Eric,
First of all, thanks for volunteering to tackle the Mavenization RPM/DEB.
My initial approach to the patch was heavily based in profiles doing what you
are suggesting. The end result was a very large POM for 'common' with profiles
heavily relying in the order of the plugins to do the right thing (I had to
define all the plugins, even if not used, in the main <build> to ensure the
right order of execution when the profiles are active). The result was a POM
difficult to follow and to update (got bitten a few times while improving it).
My second approach, the current one, it is much cleaner in that regard. It
fully leverages Maven reactor and build times are not affected. Following is a
table that shows the time taken by common build tasks:
|| Build task || Ant command || Maven
command || Ant Time || Maven time ||
| *clean* | ant clean | mvn
clean | 00:02 | 00:01 * |
| *clean compile* | ant clean compile | mvn
clean compile | 00:20 | 00:13 * |
| *clean test-compile* | ant clean test-compile | mvn
clean test -DskipTests | 00:23 | 00:17 * |
| *clean 1 test* | ant clean test -Dtestcase=TestConfiguration | mvn
clean test -Dtest=TestConfiguration | 01:09 | *00:27* * |
| *<warm> 1 test* | ant test -Dtestcase=TestConfiguration | mvn
test -Dtest=TestConfiguration | 00:52 | *00:11* * |
| *clean jar test-jar* | ant clean jar jar-test | mvn
clean package | 00:28 | 00:23 * |
| *clean binary-tar* | ant clean binary | mvn
clean package post-site -DskipTests | 00:59 | 00:46 * |
| *clean tar w/docs* | ant clean tar | mvn
clean package post-site -DskipTests -Pdocs | N/A | 04:10 |
| *clean tar w/docs/src* | mvn clean tar | mvn
clean package post-site -DskipTests -Pdocs -Psource | 01:34 * | 05:18 |
Of all these, IMO the *most* interesting improvement is running a single test
(from scratch and with pre-compiled classes). This will be a huge improvement
for development.
Said this, we could merge {{hadoop-docs}} in {{hadoop-common}}, using the
'site' phase to wire all documentation generation (I think this wouldn't
complicate things too much).
However, for TAR/RPM/DEB I would like to keep a different module which kicks
with the assembly plugin to generate the TAR/RMP/DEB. And there we could have a
profiles that build a TAR, a RMP and/or a DEB.
Another benefit of this is that all scripts and stuff would end up in the
TAR/RMP/DEB module, the {{hadoop-common}} module only produces a JAR file.
The layout would then be:
{code}
trunk/pom.xml
|
|-- hadoop-annotations/pom.xml (javadoc annotations and doclet)
|
|-- hadoop-project/pom.xml (dependency management, extended by all other
modules))
|
|-- common/pom.xml
| |
| |-- hadoop-common/pom.xml [clean, compile,package,install,deploy,site]
(-Pnative)
| |
| |-- hadoop-common-distro/pom.xml [clean, assembly:single] (-Ptar -Prpm
-Pdeb)
|
|-- hdfs
|
|-- mapreduce
{code}
The [...] are the meaningful lifecycle phases.
The (-P...) are the profiles each module would support.
The only thing we have to sort out is how to wire the maven-antrun-plugin to
run after the 'assembly:single' invocation. This is required to be able to
create invoke Unix TAR to create the TAR in order to preserve the symlinks.
Would you be OK with this approach?
Thoughts?
PS: I'm somehow familiar with Hbase packaging and the current overloading of
maven phases and profiles usages makings things too slow (until not long ago,
not sure if still valid, running 'mvn install' was generating the TAR).
> To use maven for hadoop common builds
> -------------------------------------
>
> Key: HADOOP-6671
> URL: https://issues.apache.org/jira/browse/HADOOP-6671
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: build
> Affects Versions: 0.22.0
> Reporter: Giridharan Kesavan
> Assignee: Alejandro Abdelnur
> Attachments: HADOOP-6671-cross-project-HDFS.patch,
> HADOOP-6671-e.patch, HADOOP-6671-f.patch, HADOOP-6671-g.patch,
> HADOOP-6671-h.patch, HADOOP-6671-i.patch, HADOOP-6671-j.patch,
> HADOOP-6671-k.sh, HADOOP-6671-l.patch, HADOOP-6671-m.patch,
> HADOOP-6671-n.patch, HADOOP-6671-o.patch, HADOOP-6671-p.patch,
> HADOOP-6671-q.patch, HADOOP-6671.patch, HADOOP-6671b.patch,
> HADOOP-6671c.patch, HADOOP-6671d.patch, build.png, common-mvn-layout-i.sh,
> hadoop-commons-maven.patch, mvn-layout-e.sh, mvn-layout-f.sh,
> mvn-layout-k.sh, mvn-layout-l.sh, mvn-layout-m.sh, mvn-layout-n.sh,
> mvn-layout-o.sh, mvn-layout-p.sh, mvn-layout-q.sh, mvn-layout.sh,
> mvn-layout.sh, mvn-layout2.sh, mvn-layout2.sh
>
>
> We are now able to publish hadoop artifacts to the maven repo successfully [
> Hadoop-6382]
> Drawbacks with the current approach:
> * Use ivy for dependency management with ivy.xml
> * Use maven-ant-task for artifact publishing to the maven repository
> * pom files are not generated dynamically
> To address this I propose we use maven to build hadoop-common, which would
> help us to manage dependencies, publish artifacts and have one single xml
> file(POM) for dependency management and artifact publishing.
> I would like to have a branch created to work on mavenizing hadoop common.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira