One thing I forgot to mention: One of the things driving the creation of new shared libraries is interdependencies. For example:
libarrow -> libparquet libarrow -> libarrow_dataset libparquet -> libarrow_dataset With the modular LLVM-like approach this issue goes away. On Thu, Sep 12, 2019 at 1:16 PM Wes McKinney <wesmck...@gmail.com> wrote: > > I forgot to add the link to the LLVM library listing > > https://gist.github.com/wesm/d13c2844db0c19477e8ee5c95e36a0dc > > On Thu, Sep 12, 2019 at 1:14 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > > hi folks, > > > > I wanted to share some concerns that I have about our current > > trajectory with regards to producing shared libraries from the Arrow > > build system. > > > > Currently, a comprehensive build produces many shared libraries: > > > > * libarrow > > * libarrow_dataset > > * libarrow_flight > > * libarrow_python > > * libgandiva > > * libparquet > > * libplasma > > > > There are some others. There are a number of problems with the current > > approach: > > > > * Each DLL needs its own set of "visibility" macros to control the use > > of __declspec(dllimport/dllexport) on Windows, which is necessary to > > instruct the import or export of symbols between DLLs on Windows. See > > e.g. > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/visibility.h > > > > * Templates instantiated in one DLL may cause a violation of the One > > Definition Rule during linking (we lost at least a day of work time > > collectively to issues around this in ARROW-6244). It is good to be > > able to share common template interfaces in general > > > > * Statically-linked dependencies in one shared lib may need to be > > statically linked into another library. For example, libgandiva > > statically links parts of LLVM, but we will likely have some other > > code that makes use of LLVM for other purposes (it has been discussed > > in the context of Avro parsing) > > > > Overall, my preferred solution to these issues is to move to a similar > > approach to what the LLVM project does. To help understand, let me > > have you first look at the libraries that come from the llvm-7-dev > > package on Ubuntu > > > > Here we have a collection of static "module" libraries that implement > > different parts of the LLVM platform. Finally, a _single_ shared > > library libLLVM-7.so is produced. > > > > I think we should do the same thing in Apache Arrow. So we only ever > > will produce a single shared library from the build. We can > > additionally make the "name" of this shared library configurable to > > suit different needs. For example, the default name could be simply > > "libarrow.so" or something. But if someone wants to produce a > > barebones Parquet shared library they can override the name to create > > a "libparquet.so" that contains only the "libarrow_core.a" and > > "libarrow_io.a" symbols needed for reading Parquet files. > > > > This would have additional benefits: > > > > * Use the same visibility macros for all exported C++ symbols, rather > > than having to define DLL-specific visibility > > > > * Improved modularization of builds and linking for third party users, > > similar to the way that LLVM's modular linking works, see the way that > > Gandiva requests specific components from LLVM to use for static > > linking > > https://github.com/apache/arrow/blob/master/cpp/cmake_modules/FindLLVM.cmake#L53 > > > > * Net simpler linking and deployment. Only one shared library to deal with > > > > There are some drawbacks, however: > > > > * Our C++ Linux packaging approach would need to be changed to be more > > LLVM-like (a single .deb/.yum package containing the C++ platform > > rather than many packages as now) > > > > Interested to hear from other C++ developers. > > > > Thanks > > Wes