Hello,
I'm contacting you on behalf of the LCG Releases team at CERN. We
provide a common software stack for LHCb, ATLAS and others to be used at
CERN and the worldwide computing grid.
Right now we're looking into optimizing the way we're building Apache
Arrow (C++ & Python) and its dependencies. Ideally we'd like to build
Arrow using only the minimum of necessary dependencies to run it, and to
use packages already installed in the stack to fulfill these
dependencies. The former would be nice to keep the stack clean, the
latter would help us avoid duplication and failing builds due to mirrors
going offline.
Our builds currently run with the ARROW_DEPENDENCY_SOURCE=AUTO
<https://github.com/apache/arrow/blob/master/docs/source/developers/cpp.rst>
setting, which results in duplicate and non-essential packages being
downloaded by Arrow, as well as dependency on external mirrors. Setting
it to SYSTEM allows us to avoid the downloads, but then the build
process fails due to missing unused dependencies.
Do you know if there is a recommended way to achieve this? The problem
seems to stem from the fact that all listed dependencies are downloaded,
whether they are needed or not. We have considered patching out the
non-essential dependencies ('double-conversion', 'GTEST', etc.) from the
dependency list, as well as formally adding the unneeded dependencies to
the stack in order to run with the SYSTEM setting. However, if there is
a proper way to do it we would of course prefer to follow that course of
action.
Any help would be very appreciated.
Kind regards:
- Richard Bachmann