lhotari edited a comment on pull request #8485:
URL: https://github.com/apache/pulsar/pull/8485#issuecomment-724474645


   I'm currently experimenting a solution where the build is split in to 
multiple phases
   
   1. license check, build Pulsar artifacts
   2. run unit tests
   3. build docker images
   4. run integration tests
   
   Each phase is a "job" in Github Workflow. The the unit test and integration 
test jobs have parallel sub-jobs by using the matrix feature of Github Flows.
   
   The challenge is the large size of Pulsar artifacts. Currently the 
~/.m2/repository/org/apache/pulsar files installed with "mvn install" are about 
2.5 GB in size.
   Break down of directory sizes in MB:
   https://gist.github.com/lhotari/3da3b220edd5684e54a005f358f3d045
   
   The large size of artifacts seems to be caused by shaded and bundled 
dependencies. 
   The bundled dependencies seems to be the pulsar-io modules built with 
[nifi-nar-maven-plugin](https://github.com/apache/nifi-maven). This results in 
the excessive IO during builds.
   
   The solution seems to be to create yet another maven profile that is for 
building just the essentials for running unit tests. Unit tests should be able 
to run without building the shaded jars, the distribution or the nar modules 
with the embedded dependencies.
   
   Perhaps there's also a way to share the dependencies across the Pulsar IO 
nar modules. It seems like a waste to duplicate most of the same dependencies 
in each nar file.
   
   ---
   
   The size of the core-modules maven profile is already much lower than the 
total size of the Pulsar artifacts. 
   `mvn install -Pcore-modules -DskipTests` produces about 377MB in 
~/.m2/repository/org/apache/pulsar:
   https://gist.github.com/lhotari/b6f51edc935787b530055a20bc685394
   This could be reduced by 269MB, from 377MB to 108MB by removing the 
"distribution/server" module out of core-modules profile.
   
   Since the artifact cache of Github actions is 5GB in total for a repository, 
it would help a lot in being able to use the artifact cache for sharing the 
artifacts between the 1. phase and the later unit test and integration test 
phases of the build.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to