Yes, I don't think we should go the full way of separating Arrow in micro-components. The IO and IPC layer aren't heavyweight. We should simply address the most often-quoted annoyances.
Regards Antoine. Le 20/09/2019 à 17:41, Wes McKinney a écrit : > Implementing the format fully requires memory management and IO > interfaces (i.e. arrow/io/{file.h, interfaces.h, memory.h}). So those > parts are not separable. > > On Fri, Sep 20, 2019 at 10:36 AM Neal Richardson > <neal.p.richard...@gmail.com> wrote: >> >> I wonder if having a core "format" C++ library, which the io, compute, >> etc. library/libraries would depend on, is a natural step. >> Particularly since we're coming up on 1.0 and the format is being >> declared stable. >> >> Neal >> >> On Fri, Sep 20, 2019 at 8:28 AM Wes McKinney <wesmck...@gmail.com> wrote: >>> >>> We would have to be even more careful about managing symbol exports. >>> Third party projects would need to link more libraries in their >>> applications (not unlike the way that Boost works now -- I suppose >>> that Boost is the closest analogue to what we're going for) >>> >>> On Fri, Sep 20, 2019 at 2:30 AM Micah Kornfield <emkornfi...@gmail.com> >>> wrote: >>>>> >>>>> We could indeed split up libarrow into more shared libraries. This >>>>> would mean accepting a lot more maintenance effort though, on a team >>>>> that is already overburdened. I'm not too keen on that in the short >>>>> term. >>>> >>>> >>>> Something for longer term to think about. What are you seeing as the >>>> added maintenance here? >>>> >>>> >>>> On Thu, Sep 19, 2019 at 5:38 PM Wes McKinney <wesmck...@gmail.com> wrote: >>>>> >>>>> hi Micah, >>>>> >>>>> >>>>> On Thu, Sep 19, 2019 at 12:41 AM Micah Kornfield <emkornfi...@gmail.com> >>>>> wrote: >>>>>> >>>>>>> >>>>>>> * Should optional components be "opt in", "out out", or a mix? >>>>>>> Currently it's a mix, and that's confusing for people. I think we >>>>>>> should make them all "opt in". >>>>>> >>>>>> Agreed they should all be opt in by default. I think active developer >>>>>> are >>>>>> quite adept at flipping the appropriate CMake flags. >>>>>> >>>>> >>>>> Cool. I opened a tracking JIRA >>>>> https://issues.apache.org/jira/browse/ARROW-6637 and attached many >>>>> issues. Sorry for the new JIRA flood >>>>> >>>>>> >>>>>>> * Do we want to bring the out-of-the-box core build down to zero >>>>>>> dependencies, including not depending on boost::filesystem and >>>>>>> possibly checking the compiled Flatbuffers files. >>>>>> >>>>>> While it may be >>>>>>> slightly more maintenance work, I think the optics of a >>>>>>> "dependency-free" core build would be beneficial and help the project >>>>>>> marketing-wise. >>>>>> >>>>>> I'm -.5 on checking in generated artifacts but this is mostly stylistic. >>>>>> In the case of flatbuffers it seems like we might be able to get-away >>>>>> with >>>>>> vendoring since it should mostly be headers only. >>>>>> >>>>>> I would prefer to try come up with more granular components and be >>>>>> very conservative on what is "core". I think it should be possible have >>>>>> a >>>>>> zero dependency build if only MemoryPool, Buffers, Arrays and >>>>>> ArrayBuilders >>>>>> in a core package [1]. This combined with discussion Antoine started on >>>>>> an >>>>>> ABI compatible C-layer would make basic inter-op within a process >>>>>> reasonable. Moving up the stack to IPC and files, there is probably a >>>>>> way >>>>>> to package headers separately from implementations. This would allow >>>>>> other >>>>>> projects wishing to integrate with Arrow to bring their own >>>>>> implementations >>>>>> without the baggage of boost::filesystem. Would this leave anything >>>>>> besides >>>>>> "flatbuffers" as a hard dependency to support IPC? >>>>>> >>>>> >>>>> We could indeed split up libarrow into more shared libraries. This >>>>> would mean accepting a lot more maintenance effort though, on a team >>>>> that is already overburdened. I'm not too keen on that in the short >>>>> term. >>>>> >>>>>> Thanks, >>>>>> Micah >>>>>> >>>>>> >>>>>> [1] It probably makes sense to go even further and separate out >>>>>> MemoryPool >>>>>> and Buffer, so we can break the circular relationship between parquet and >>>>>> arrow. >>>>> >>>>> Don't think this is possible even then, particularly in light of my >>>>> recent work reading and writing Arrow columnar data "closer to the >>>>> metal" inside Parquet, yielding beneficial performance improvements. >>>>> >>>>>> >>>>>> On Wed, Sep 18, 2019 at 8:03 AM Wes McKinney <wesmck...@gmail.com> wrote: >>>>>> >>>>>>> To be clear I think we should make these changes right after 0.15.0 is >>>>>>> released so we aren't playing whackamole with our packaging scripts. >>>>>>> I'm happy to take the lead on the work... >>>>>>> >>>>>>> On Wed, Sep 18, 2019 at 9:54 AM Antoine Pitrou <solip...@pitrou.net> >>>>>>> wrote: >>>>>>>> >>>>>>>> On Wed, 18 Sep 2019 09:46:54 -0500 >>>>>>>> Wes McKinney <wesmck...@gmail.com> wrote: >>>>>>>>> I think these are both interesting areas to explore further. I'd like >>>>>>>>> to focus on the couple of immediate items I think we should address >>>>>>>>> >>>>>>>>> * Should optional components be "opt in", "out out", or a mix? >>>>>>>>> Currently it's a mix, and that's confusing for people. I think we >>>>>>>>> should make them all "opt in". >>>>>>>>> * Do we want to bring the out-of-the-box core build down to zero >>>>>>>>> dependencies, including not depending on boost::filesystem and >>>>>>>>> possibly checking the compiled Flatbuffers files. While it may be >>>>>>>>> slightly more maintenance work, I think the optics of a >>>>>>>>> "dependency-free" core build would be beneficial and help the project >>>>>>>>> marketing-wise. >>>>>>>>> >>>>>>>>> Both of these issues must be addressed whether we undertake a Bazel >>>>>>>>> implementation or some other refactor of the C++ build system. >>>>>>>> >>>>>>>> I think checking in the Flatbuffers files (and also Protobuf and Thrift >>>>>>>> where applicable :-)) would be fine. >>>>>>>> >>>>>>>> As for boost::filesystem, getting rid of it wouldn't be a huge task. >>>>>>>> Still worth deciding whether we want to prioritize development time for >>>>>>>> it, because it's not entirely trivial either. >>>>>>>> >>>>>>>> Regards >>>>>>>> >>>>>>>> Antoine. >>>>>>>> >>>>>>>> >>>>>>>