jorgecarleitao commented on pull request #8798:
URL: https://github.com/apache/arrow/pull/8798#issuecomment-735375266


   (this is not for now and not blocking this PR, it is just a thought)
   
   Not necessarily per target or crate:
   
   As I see it, we have builds that we do sequentially on the `script.sh`, but 
that do not benefit from being built in sequence. Even when they do benefit, I 
think that we would benefit from splitting them in multiple jobs and use 
artifacts to share the common state  (`target` in this case) between them, so 
that caching is more granular.
   
   More generally, our builds are basically a DAG where each node is an 
execution that benefits from having artifacts available:
   
   * build arrow ext dependencies libs: `[]`
   * build arrow lib: `[arrow ext dependencies libs]`
   * build parquet lib: `[parquet ext dependencies libs, arrow lib]`
   * build arrow tests bin: `[arrow tests ext dependencies, arrow lib]`
   * build parquet tests bin: `[arrow tests ext dependencies, arrow lib, 
parquet]`
   * run tests: `[arrow tests bin]`
   
   A feature flag is a new build of the lib+bin, but typically shares the same 
external dependencies and thus would be something like
   
   * build arrow lib simd: `[arrow ext dependencies libs]`
   * build arrow tests bin: `[arrow tests ext dependencies, arrow lib simd]`
   * run tests `simd`: `[arrow tests bin]`
   
   An architecture is a new complete build and does not share a lineage with 
other architectures.
   
   Currently we run our DAG in sequence. However, there are many nodes on this 
DAG that do not depend on each other and can run in parallel (different jobs in 
github flow).
   
   IMO if we split our build in different jobs that outputs an artifact and 
create a DAG of these jobs, we are able to run our pipeline faster by 
leveraging parallelism of the build.
   
   This is something that I fielded on the mailing list in the context of the 
integration tests, but that it is also applicable to our own builds.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to