One thing I forgot to mention:

One of the things driving the creation of new shared libraries is
interdependencies. For example:

libarrow -> libparquet
libarrow -> libarrow_dataset
libparquet -> libarrow_dataset

With the modular LLVM-like approach this issue goes away.

On Thu, Sep 12, 2019 at 1:16 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> I forgot to add the link to the LLVM library listing
>
> https://gist.github.com/wesm/d13c2844db0c19477e8ee5c95e36a0dc
>
> On Thu, Sep 12, 2019 at 1:14 PM Wes McKinney <wesmck...@gmail.com> wrote:
> >
> > hi folks,
> >
> > I wanted to share some concerns that I have about our current
> > trajectory with regards to producing shared libraries from the Arrow
> > build system.
> >
> > Currently, a comprehensive build produces many shared libraries:
> >
> > * libarrow
> > * libarrow_dataset
> > * libarrow_flight
> > * libarrow_python
> > * libgandiva
> > * libparquet
> > * libplasma
> >
> > There are some others. There are a number of problems with the current 
> > approach:
> >
> > * Each DLL needs its own set of "visibility" macros to control the use
> > of __declspec(dllimport/dllexport) on Windows, which is necessary to
> > instruct the import or export of symbols between DLLs on Windows. See
> > e.g. 
> > https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/visibility.h
> >
> > * Templates instantiated in one DLL may cause a violation of the One
> > Definition Rule during linking (we lost at least a day of work time
> > collectively to issues around this in ARROW-6244). It is good to be
> > able to share common template interfaces in general
> >
> > * Statically-linked dependencies in one shared lib may need to be
> > statically linked into another library. For example, libgandiva
> > statically links parts of LLVM, but we will likely have some other
> > code that makes use of LLVM for other purposes (it has been discussed
> > in the context of Avro parsing)
> >
> > Overall, my preferred solution to these issues is to move to a similar
> > approach to what the LLVM project does. To help understand, let me
> > have you first look at the libraries that come from the llvm-7-dev
> > package on Ubuntu
> >
> > Here we have a collection of static "module" libraries that implement
> > different parts of the LLVM platform. Finally, a _single_ shared
> > library libLLVM-7.so is produced.
> >
> > I think we should do the same thing in Apache Arrow. So we only ever
> > will produce a single shared library from the build. We can
> > additionally make the "name" of this shared library configurable to
> > suit different needs. For example, the default name could be simply
> > "libarrow.so" or something. But if someone wants to produce a
> > barebones Parquet shared library they can override the name to create
> > a "libparquet.so" that contains only the "libarrow_core.a" and
> > "libarrow_io.a" symbols needed for reading Parquet files.
> >
> > This would have additional benefits:
> >
> > * Use the same visibility macros for all exported C++ symbols, rather
> > than having to define DLL-specific visibility
> >
> > * Improved modularization of builds and linking for third party users,
> > similar to the way that LLVM's modular linking works, see the way that
> > Gandiva requests specific components from LLVM to use for static
> > linking 
> > https://github.com/apache/arrow/blob/master/cpp/cmake_modules/FindLLVM.cmake#L53
> >
> > * Net simpler linking and deployment. Only one shared library to deal with
> >
> > There are some drawbacks, however:
> >
> > * Our C++ Linux packaging approach would need to be changed to be more
> > LLVM-like (a single .deb/.yum package containing the C++ platform
> > rather than many packages as now)
> >
> > Interested to hear from other C++ developers.
> >
> > Thanks
> > Wes

Reply via email to