jorgecarleitao commented on pull request #8798: URL: https://github.com/apache/arrow/pull/8798#issuecomment-735375266
(this is not for now and not blocking this PR, it is just a thought) Not necessarily per target or crate: As I see it, we have builds that we do sequentially on the `script.sh`, but that do not benefit from being built in sequence. Even when they do benefit, I think that we would benefit from splitting them in multiple jobs and use artifacts to share the common state (`target` in this case) between them, so that caching is more granular. More generally, our builds are basically a DAG where each node is an execution that benefits from having artifacts available: * build arrow ext dependencies libs: `[]` * build arrow lib: `[arrow ext dependencies libs]` * build parquet lib: `[parquet ext dependencies libs, arrow lib]` * build arrow tests bin: `[arrow tests ext dependencies, arrow lib]` * build parquet tests bin: `[arrow tests ext dependencies, arrow lib, parquet]` * run tests: `[arrow tests bin]` A feature flag is a new build of the lib+bin, but typically shares the same external dependencies and thus would be something like * build arrow lib simd: `[arrow ext dependencies libs]` * build arrow tests bin: `[arrow tests ext dependencies, arrow lib simd]` * run tests `simd`: `[arrow tests bin]` An architecture is a new complete build and does not share a lineage with other architectures. Currently we run our DAG in sequence. However, there are many nodes on this DAG that do not depend on each other and can run in parallel (different jobs in github flow). IMO if we split our build in different jobs that outputs an artifact and create a DAG of these jobs, we are able to run our pipeline faster by leveraging parallelism of the build. This is something that I fielded on the mailing list in the context of the integration tests, but that it is also applicable to our own builds. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org