[
https://issues.apache.org/jira/browse/HADOOP-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801093#comment-17801093
]
Xiaoqiao He commented on HADOOP-19019:
--------------------------------------
Thanks [~jialiang] for your works. Move from HDFS to COMMON module.
> Parallel Maven Build Support for Apache Hadoop
> ----------------------------------------------
>
> Key: HADOOP-19019
> URL: https://issues.apache.org/jira/browse/HADOOP-19019
> Project: Hadoop Common
> Issue Type: Improvement
> Components: build
> Reporter: caijialiang
> Priority: Major
> Labels: pull-request-available
> Attachments: patch11-HDFS-17287.diff
>
>
> The reason for the slow compilation: The Hadoop project has many modules, and
> the inability to compile them in parallel results in a slow process. For
> instance, the first compilation of Hadoop might take several hours, and even
> with local Maven dependencies, a subsequent compilation can still take close
> to 40 minutes, which is very slow.
> How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to
> investigate the dependency issues that prevent parallel compilation.
> * Investigate the dependencies between project modules.
> * Analyze the dependencies in multi-module Maven projects.
> * Download {{{}maven-to-plantuml{}}}:
>
> {{wget
> [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}}
> * Generate a dependency tree:
>
> {{mvn dependency:tree > dep.txt}}
> * Generate a UML diagram from the dependency tree:
>
> {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}}
> For more information, visit: [maven-to-plantuml GitHub
> repository|https://github.com/phxql/maven-to-plantuml/tree/master].
> Here's the translation of the Hadoop PR description into English:
> *Hadoop Parallel Compilation Submission Logic*
> # Reasons for Parallel Compilation Failure
> ** In sequential compilation, as modules are compiled one by one in order,
> there are no errors because the compilation follows the module sequence.
> ** However, in parallel compilation, all modules are compiled
> simultaneously. The compilation order during multi-module concurrent
> compilation depends on the inter-module dependencies. If Module A depends on
> Module B, then Module B will be compiled before Module A. This ensures that
> the compilation order follows the dependencies between modules.
> But when Hadoop compiles in parallel, for example, compiling
> {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct.
> The issue arises during the dist package stage. {{dist}} packages all other
> compiled modules.
> *Behavior of {{hadoop-yarn-project}} in Serial Compilation:*
> ** In serial compilation, it compiles modules in the pom one by one in
> sequence. After all modules are compiled, it compiles
> {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the
> {{maven-assembly-plugin}} plugin is executed for packaging. All packages are
> repackaged according to the description in
> {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}.
> *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:*
> ** Parallel compilation compiles modules according to the dependency order
> among them. If modules do not declare dependencies on each other through
> {{{}dependency{}}}, they are compiled in parallel. According to the
> dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the
> dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}},
> executing its {{{}maven-assembly-plugin{}}}.
> ** However, the files needed for packaging in
> {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are
> not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}.
> Therefore, when compiling {{hadoop-yarn-project}} and executing
> {{{}maven-assembly-plugin{}}}, not all required modules are built yet,
> leading to errors in parallel compilation.
> *Solution:*
> ** The solution is relatively straightforward: organize all modules from
> {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}},
> and then declare them as dependencies in the pom of
> {{{}hadoop-yarn-project{}}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]