hi folks, Lately there seem to be more and more people suggesting that the optional components in the Arrow C++ project are getting in the way of using the "core" which implements the columnar format and IPC protocol. I am not sure I agree with this argument, but in general I think it would be a good idea to make all optional components in the project "opt in" rather than "opt out"
To demonstrate where things currently stand, I created a Dockerfile to try to make the smallest possible and most dependency-free build https://github.com/wesm/arrow/tree/cpp-minimal-dockerfile/dev/cpp_minimal Here is the output of this build https://gist.github.com/wesm/02328fbb463033ed486721b8265f755f First, let's look at the CMake invocation cmake .. -DBOOST_SOURCE=BUNDLED \ -DARROW_BOOST_USE_SHARED=OFF \ -DARROW_COMPUTE=OFF \ -DARROW_DATASET=OFF \ -DARROW_JEMALLOC=OFF \ -DARROW_JSON=ON \ -DARROW_USE_GLOG=OFF \ -DARROW_WITH_BZ2=OFF \ -DARROW_WITH_ZLIB=OFF \ -DARROW_WITH_ZSTD=OFF \ -DARROW_WITH_LZ4=OFF \ -DARROW_WITH_SNAPPY=OFF \ -DARROW_WITH_BROTLI=OFF \ -DARROW_BUILD_UTILITIES=OFF Aside from the issue of how to obtain and link Boost, here's a couple of things: * COMPUTE and DATASET IMHO should be off by default * All compression libraries should be turned off * GLOG should be off by default * Utilities should be off (they are used for integration testing) * Jemalloc should probably be off, but we should make it clear that opting in will yield better performance I found that it wasn't possible to set ARROW_JSON=OFF without breaking the build. I opened ARROW-6590 to fix this Aside from potentially changing these defaults, there's some things in the build that we might want to turn into optional pieces: * We should see if we can make boost::filesystem not mandatory in the barebones build, if only to satisfy the peanut gallery * double-conversion is used in the CSV module. I think that double-conversion_ep and the CSV module should both be made opt-in * rapidjson_ep should be made optional. JSON support is only needed for integration testing We could also discuss vendoring flatbuffers.h so that flatbuffers_ep is not mandatory. In general, enabling optional components is primarily relevant for packagers. If we implement these changes, a number of package build scripts will have to change. Thanks, Wes