Hi, I understand what problems we want to solve. Especially template and DLL in ARROW-6244.
I feel that one shared library is overkill because we have many namespaces. If we have only arrow:: namespace, it's reasonable. But we have arrow::, gandiva::, parquet:: and plasma:: namespaces. It's a bit unnatural that libarrow.so includes symbols from all namespaces. (I think that LLVM uses only llvm:: namespace.) For template and DLL case, I think that one shared library isn't a solution. Because if we forget to instantiate a common method in the one shared library like [1], users for the one shared library will face the same problem. It's occurred after we release the one shared library. [1] https://github.com/apache/arrow/pull/5221/commits/e88b2579f04451d741eeddcb6697914bcc1019a6 If we have multiple shared libraries, we can find the problem in our development process without releasing new version. (We may be able to find by integration test with other projects.) I think that the real solution for the template and DLL problem is preventing instantiating template in shared library. (I don't know a way to check this... Sorry.) Note that I don't strongly oppose to one shared library idea. I just feel overkill. Boost uses BOOST_${MODULE}_DECL approach like we currently do. So our current approach isn't so bad...? Thanks, -- kou In <CAJPUwMA2-y2EabvVs3wpV0KEBNez2cC0oxFOCOfzn_aym3=h...@mail.gmail.com> "[DISCUSS][C++] Rethinking our current C++ shared library (.so / .dll) approach" on Thu, 12 Sep 2019 13:14:55 -0500, Wes McKinney <wesmck...@gmail.com> wrote: > hi folks, > > I wanted to share some concerns that I have about our current > trajectory with regards to producing shared libraries from the Arrow > build system. > > Currently, a comprehensive build produces many shared libraries: > > * libarrow > * libarrow_dataset > * libarrow_flight > * libarrow_python > * libgandiva > * libparquet > * libplasma > > There are some others. There are a number of problems with the current > approach: > > * Each DLL needs its own set of "visibility" macros to control the use > of __declspec(dllimport/dllexport) on Windows, which is necessary to > instruct the import or export of symbols between DLLs on Windows. See > e.g. > https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/visibility.h > > * Templates instantiated in one DLL may cause a violation of the One > Definition Rule during linking (we lost at least a day of work time > collectively to issues around this in ARROW-6244). It is good to be > able to share common template interfaces in general > > * Statically-linked dependencies in one shared lib may need to be > statically linked into another library. For example, libgandiva > statically links parts of LLVM, but we will likely have some other > code that makes use of LLVM for other purposes (it has been discussed > in the context of Avro parsing) > > Overall, my preferred solution to these issues is to move to a similar > approach to what the LLVM project does. To help understand, let me > have you first look at the libraries that come from the llvm-7-dev > package on Ubuntu > > Here we have a collection of static "module" libraries that implement > different parts of the LLVM platform. Finally, a _single_ shared > library libLLVM-7.so is produced. > > I think we should do the same thing in Apache Arrow. So we only ever > will produce a single shared library from the build. We can > additionally make the "name" of this shared library configurable to > suit different needs. For example, the default name could be simply > "libarrow.so" or something. But if someone wants to produce a > barebones Parquet shared library they can override the name to create > a "libparquet.so" that contains only the "libarrow_core.a" and > "libarrow_io.a" symbols needed for reading Parquet files. > > This would have additional benefits: > > * Use the same visibility macros for all exported C++ symbols, rather > than having to define DLL-specific visibility > > * Improved modularization of builds and linking for third party users, > similar to the way that LLVM's modular linking works, see the way that > Gandiva requests specific components from LLVM to use for static > linking > https://github.com/apache/arrow/blob/master/cpp/cmake_modules/FindLLVM.cmake#L53 > > * Net simpler linking and deployment. Only one shared library to deal with > > There are some drawbacks, however: > > * Our C++ Linux packaging approach would need to be changed to be more > LLVM-like (a single .deb/.yum package containing the C++ platform > rather than many packages as now) > > Interested to hear from other C++ developers. > > Thanks > Wes