We would have to be even more careful about managing symbol exports. Third party projects would need to link more libraries in their applications (not unlike the way that Boost works now -- I suppose that Boost is the closest analogue to what we're going for)
On Fri, Sep 20, 2019 at 2:30 AM Micah Kornfield <emkornfi...@gmail.com> wrote: >> >> We could indeed split up libarrow into more shared libraries. This >> would mean accepting a lot more maintenance effort though, on a team >> that is already overburdened. I'm not too keen on that in the short >> term. > > > Something for longer term to think about. What are you seeing as the added > maintenance here? > > > On Thu, Sep 19, 2019 at 5:38 PM Wes McKinney <wesmck...@gmail.com> wrote: >> >> hi Micah, >> >> >> On Thu, Sep 19, 2019 at 12:41 AM Micah Kornfield <emkornfi...@gmail.com> >> wrote: >> > >> > > >> > > * Should optional components be "opt in", "out out", or a mix? >> > > Currently it's a mix, and that's confusing for people. I think we >> > > should make them all "opt in". >> > >> > Agreed they should all be opt in by default. I think active developer are >> > quite adept at flipping the appropriate CMake flags. >> > >> >> Cool. I opened a tracking JIRA >> https://issues.apache.org/jira/browse/ARROW-6637 and attached many >> issues. Sorry for the new JIRA flood >> >> > >> > > * Do we want to bring the out-of-the-box core build down to zero >> > > dependencies, including not depending on boost::filesystem and >> > > possibly checking the compiled Flatbuffers files. >> > >> > While it may be >> > > slightly more maintenance work, I think the optics of a >> > > "dependency-free" core build would be beneficial and help the project >> > > marketing-wise. >> > >> > I'm -.5 on checking in generated artifacts but this is mostly stylistic. >> > In the case of flatbuffers it seems like we might be able to get-away with >> > vendoring since it should mostly be headers only. >> > >> > I would prefer to try come up with more granular components and be >> > very conservative on what is "core". I think it should be possible have a >> > zero dependency build if only MemoryPool, Buffers, Arrays and ArrayBuilders >> > in a core package [1]. This combined with discussion Antoine started on an >> > ABI compatible C-layer would make basic inter-op within a process >> > reasonable. Moving up the stack to IPC and files, there is probably a way >> > to package headers separately from implementations. This would allow other >> > projects wishing to integrate with Arrow to bring their own implementations >> > without the baggage of boost::filesystem. Would this leave anything besides >> > "flatbuffers" as a hard dependency to support IPC? >> > >> >> We could indeed split up libarrow into more shared libraries. This >> would mean accepting a lot more maintenance effort though, on a team >> that is already overburdened. I'm not too keen on that in the short >> term. >> >> > Thanks, >> > Micah >> > >> > >> > [1] It probably makes sense to go even further and separate out MemoryPool >> > and Buffer, so we can break the circular relationship between parquet and >> > arrow. >> >> Don't think this is possible even then, particularly in light of my >> recent work reading and writing Arrow columnar data "closer to the >> metal" inside Parquet, yielding beneficial performance improvements. >> >> > >> > On Wed, Sep 18, 2019 at 8:03 AM Wes McKinney <wesmck...@gmail.com> wrote: >> > >> > > To be clear I think we should make these changes right after 0.15.0 is >> > > released so we aren't playing whackamole with our packaging scripts. >> > > I'm happy to take the lead on the work... >> > > >> > > On Wed, Sep 18, 2019 at 9:54 AM Antoine Pitrou <solip...@pitrou.net> >> > > wrote: >> > > > >> > > > On Wed, 18 Sep 2019 09:46:54 -0500 >> > > > Wes McKinney <wesmck...@gmail.com> wrote: >> > > > > I think these are both interesting areas to explore further. I'd like >> > > > > to focus on the couple of immediate items I think we should address >> > > > > >> > > > > * Should optional components be "opt in", "out out", or a mix? >> > > > > Currently it's a mix, and that's confusing for people. I think we >> > > > > should make them all "opt in". >> > > > > * Do we want to bring the out-of-the-box core build down to zero >> > > > > dependencies, including not depending on boost::filesystem and >> > > > > possibly checking the compiled Flatbuffers files. While it may be >> > > > > slightly more maintenance work, I think the optics of a >> > > > > "dependency-free" core build would be beneficial and help the project >> > > > > marketing-wise. >> > > > > >> > > > > Both of these issues must be addressed whether we undertake a Bazel >> > > > > implementation or some other refactor of the C++ build system. >> > > > >> > > > I think checking in the Flatbuffers files (and also Protobuf and Thrift >> > > > where applicable :-)) would be fine. >> > > > >> > > > As for boost::filesystem, getting rid of it wouldn't be a huge task. >> > > > Still worth deciding whether we want to prioritize development time for >> > > > it, because it's not entirely trivial either. >> > > > >> > > > Regards >> > > > >> > > > Antoine. >> > > > >> > > > >> > >