hi folks,

Lately there seem to be more and more people suggesting that the
optional components in the Arrow C++ project are getting in the way of
using the "core" which implements the columnar format and IPC
protocol. I am not sure I agree with this argument, but in general I
think it would be a good idea to make all optional components in the
project "opt in" rather than "opt out"

To demonstrate where things currently stand, I created a Dockerfile to
try to make the smallest possible and most dependency-free build

https://github.com/wesm/arrow/tree/cpp-minimal-dockerfile/dev/cpp_minimal

Here is the output of this build

https://gist.github.com/wesm/02328fbb463033ed486721b8265f755f

First, let's look at the CMake invocation

cmake .. -DBOOST_SOURCE=BUNDLED \
-DARROW_BOOST_USE_SHARED=OFF \
-DARROW_COMPUTE=OFF \
-DARROW_DATASET=OFF \
-DARROW_JEMALLOC=OFF \
-DARROW_JSON=ON \
-DARROW_USE_GLOG=OFF \
-DARROW_WITH_BZ2=OFF \
-DARROW_WITH_ZLIB=OFF \
-DARROW_WITH_ZSTD=OFF \
-DARROW_WITH_LZ4=OFF \
-DARROW_WITH_SNAPPY=OFF \
-DARROW_WITH_BROTLI=OFF \
-DARROW_BUILD_UTILITIES=OFF

Aside from the issue of how to obtain and link Boost, here's a couple of things:

* COMPUTE and DATASET IMHO should be off by default
* All compression libraries should be turned off
* GLOG should be off by default
* Utilities should be off (they are used for integration testing)
* Jemalloc should probably be off, but we should make it clear that
opting in will yield better performance

I found that it wasn't possible to set ARROW_JSON=OFF without breaking
the build. I opened ARROW-6590 to fix this

Aside from potentially changing these defaults, there's some things in
the build that we might want to turn into optional pieces:

* We should see if we can make boost::filesystem not mandatory in the
barebones build, if only to satisfy the peanut gallery
* double-conversion is used in the CSV module. I think that
double-conversion_ep and the CSV module should both be made opt-in
* rapidjson_ep should be made optional. JSON support is only needed
for integration testing

We could also discuss vendoring flatbuffers.h so that flatbuffers_ep
is not mandatory.

In general, enabling optional components is primarily relevant for
packagers. If we implement these changes, a number of package build
scripts will have to change.

Thanks,
Wes

Reply via email to