[
https://issues.apache.org/jira/browse/HADOOP-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801641#comment-17801641
]
ASF GitHub Bot commented on HADOOP-19019:
-----------------------------------------
JiaLiangC commented on code in PR #6373:
URL: https://github.com/apache/hadoop/pull/6373#discussion_r1439125229
##########
hadoop-yarn-project/pom.xml:
##########
@@ -90,6 +91,56 @@
<artifactId>hadoop-yarn-applications-catalog-webapp</artifactId>
<type>war</type>
</dependency>
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
Review Comment:
Yes, the scope here should be defined as 'provided'.
> Parallel Maven Build Support for Apache Hadoop
> ----------------------------------------------
>
> Key: HADOOP-19019
> URL: https://issues.apache.org/jira/browse/HADOOP-19019
> Project: Hadoop Common
> Issue Type: Improvement
> Components: build
> Reporter: caijialiang
> Priority: Major
> Labels: pull-request-available
> Attachments: patch11-HDFS-17287.diff
>
>
> The reason for the slow compilation: The Hadoop project has many modules, and
> the inability to compile them in parallel results in a slow process. For
> instance, the first compilation of Hadoop might take several hours, and even
> with local Maven dependencies, a subsequent compilation can still take close
> to 40 minutes, which is very slow.
> How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to
> investigate the dependency issues that prevent parallel compilation.
> * Investigate the dependencies between project modules.
> * Analyze the dependencies in multi-module Maven projects.
> * Download {{{}maven-to-plantuml{}}}:
>
> {{wget
> [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}}
> * Generate a dependency tree:
>
> {{mvn dependency:tree > dep.txt}}
> * Generate a UML diagram from the dependency tree:
>
> {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}}
> For more information, visit: [maven-to-plantuml GitHub
> repository|https://github.com/phxql/maven-to-plantuml/tree/master].
>
> *Hadoop Parallel Compilation Submission Logic*
> # Reasons for Parallel Compilation Failure
> *
> ** In sequential compilation, as modules are compiled one by one in order,
> there are no errors because the compilation follows the module sequence.
> ** However, in parallel compilation, all modules are compiled
> simultaneously. The compilation order during multi-module concurrent
> compilation depends on the inter-module dependencies. If Module A depends on
> Module B, then Module B will be compiled before Module A. This ensures that
> the compilation order follows the dependencies between modules.
> But when Hadoop compiles in parallel, for example, compiling
> {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct.
> The issue arises during the dist package stage. {{dist}} packages all other
> compiled modules.
> *Behavior of {{hadoop-yarn-project}} in Serial Compilation:*
> *
> ** In serial compilation, it compiles modules in the pom one by one in
> sequence. After all modules are compiled, it compiles
> {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the
> {{maven-assembly-plugin}} plugin is executed for packaging. All packages are
> repackaged according to the description in
> {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}.
> *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:*
> *
> ** Parallel compilation compiles modules according to the dependency order
> among them. If modules do not declare dependencies on each other through
> {{{}dependency{}}}, they are compiled in parallel. According to the
> dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the
> dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}},
> executing its {{{}maven-assembly-plugin{}}}.
> ** However, the files needed for packaging in
> {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are
> not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}.
> Therefore, when compiling {{hadoop-yarn-project}} and executing
> {{{}maven-assembly-plugin{}}}, not all required modules are built yet,
> leading to errors in parallel compilation.
> *Solution:*
> *
> ** The solution is relatively straightforward: organize all modules from
> {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}},
> and then declare them as dependencies in the pom of
> {{{}hadoop-yarn-project{}}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]