I wonder if having a core "format" C++ library, which the io, compute,
etc. library/libraries would depend on, is a natural step.
Particularly since we're coming up on 1.0 and the format is being
declared stable.

Neal

On Fri, Sep 20, 2019 at 8:28 AM Wes McKinney <wesmck...@gmail.com> wrote:
>
> We would have to be even more careful about managing symbol exports.
> Third party projects would need to link more libraries in their
> applications (not unlike the way that Boost works now -- I suppose
> that Boost is the closest analogue to what we're going for)
>
> On Fri, Sep 20, 2019 at 2:30 AM Micah Kornfield <emkornfi...@gmail.com> wrote:
> >>
> >> We could indeed split up libarrow into more shared libraries. This
> >> would mean accepting a lot more maintenance effort though, on a team
> >> that is already overburdened. I'm not too keen on that in the short
> >> term.
> >
> >
> > Something for longer term to think about.  What are you seeing as the added 
> > maintenance here?
> >
> >
> > On Thu, Sep 19, 2019 at 5:38 PM Wes McKinney <wesmck...@gmail.com> wrote:
> >>
> >> hi Micah,
> >>
> >>
> >> On Thu, Sep 19, 2019 at 12:41 AM Micah Kornfield <emkornfi...@gmail.com> 
> >> wrote:
> >> >
> >> > >
> >> > > * Should optional components be "opt in", "out out", or a mix?
> >> > > Currently it's a mix, and that's confusing for people. I think we
> >> > > should make them all "opt in".
> >> >
> >> > Agreed they should all be opt in by default.  I think active developer 
> >> > are
> >> > quite adept at flipping the appropriate CMake flags.
> >> >
> >>
> >> Cool. I opened a tracking JIRA
> >> https://issues.apache.org/jira/browse/ARROW-6637 and attached many
> >> issues. Sorry for the new JIRA flood
> >>
> >> >
> >> > > * Do we want to bring the out-of-the-box core build down to zero
> >> > > dependencies, including not depending on boost::filesystem and
> >> > > possibly checking the compiled Flatbuffers files.
> >> >
> >> >  While it may be
> >> > > slightly more maintenance work, I think the optics of a
> >> > > "dependency-free" core build would be beneficial and help the project
> >> > > marketing-wise.
> >> >
> >> > I'm -.5 on checking in generated artifacts but this is mostly stylistic.
> >> > In the case of flatbuffers it seems like we might be able to get-away 
> >> > with
> >> > vendoring since it should mostly be headers only.
> >> >
> >> > I would prefer to try come up with more granular components and be
> >> > very conservative on what is "core".  I think it should be possible have 
> >> > a
> >> > zero dependency build if only MemoryPool, Buffers, Arrays and 
> >> > ArrayBuilders
> >> > in a core package [1].  This combined with discussion Antoine started on 
> >> > an
> >> > ABI compatible C-layer would make basic inter-op within a process
> >> > reasonable.  Moving up the stack to IPC and files, there is probably a 
> >> > way
> >> > to package headers separately from implementations.  This would allow 
> >> > other
> >> > projects wishing to integrate with Arrow to bring their own 
> >> > implementations
> >> > without the baggage of boost::filesystem. Would this leave anything 
> >> > besides
> >> > "flatbuffers" as a hard dependency to support IPC?
> >> >
> >>
> >> We could indeed split up libarrow into more shared libraries. This
> >> would mean accepting a lot more maintenance effort though, on a team
> >> that is already overburdened. I'm not too keen on that in the short
> >> term.
> >>
> >> > Thanks,
> >> > Micah
> >> >
> >> >
> >> > [1] It probably makes sense to go even further and separate out 
> >> > MemoryPool
> >> > and Buffer, so we can break the circular relationship between parquet and
> >> > arrow.
> >>
> >> Don't think this is possible even then, particularly in light of my
> >> recent work reading and writing Arrow columnar data "closer to the
> >> metal"  inside Parquet, yielding beneficial performance improvements.
> >>
> >> >
> >> > On Wed, Sep 18, 2019 at 8:03 AM Wes McKinney <wesmck...@gmail.com> wrote:
> >> >
> >> > > To be clear I think we should make these changes right after 0.15.0 is
> >> > > released so we aren't playing whackamole with our packaging scripts.
> >> > > I'm happy to take the lead on the work...
> >> > >
> >> > > On Wed, Sep 18, 2019 at 9:54 AM Antoine Pitrou <solip...@pitrou.net>
> >> > > wrote:
> >> > > >
> >> > > > On Wed, 18 Sep 2019 09:46:54 -0500
> >> > > > Wes McKinney <wesmck...@gmail.com> wrote:
> >> > > > > I think these are both interesting areas to explore further. I'd 
> >> > > > > like
> >> > > > > to focus on the couple of immediate items I think we should address
> >> > > > >
> >> > > > > * Should optional components be "opt in", "out out", or a mix?
> >> > > > > Currently it's a mix, and that's confusing for people. I think we
> >> > > > > should make them all "opt in".
> >> > > > > * Do we want to bring the out-of-the-box core build down to zero
> >> > > > > dependencies, including not depending on boost::filesystem and
> >> > > > > possibly checking the compiled Flatbuffers files. While it may be
> >> > > > > slightly more maintenance work, I think the optics of a
> >> > > > > "dependency-free" core build would be beneficial and help the 
> >> > > > > project
> >> > > > > marketing-wise.
> >> > > > >
> >> > > > > Both of these issues must be addressed whether we undertake a Bazel
> >> > > > > implementation or some other refactor of the C++ build system.
> >> > > >
> >> > > > I think checking in the Flatbuffers files (and also Protobuf and 
> >> > > > Thrift
> >> > > > where applicable :-)) would be fine.
> >> > > >
> >> > > > As for boost::filesystem, getting rid of it wouldn't be a huge task.
> >> > > > Still worth deciding whether we want to prioritize development time 
> >> > > > for
> >> > > > it, because it's not entirely trivial either.
> >> > > >
> >> > > > Regards
> >> > > >
> >> > > > Antoine.
> >> > > >
> >> > > >
> >> > >

Reply via email to