Yes, I don't think we should go the full way of separating Arrow in
micro-components.  The IO and IPC layer aren't heavyweight.  We should
simply address the most often-quoted annoyances.

Regards

Antoine.


Le 20/09/2019 à 17:41, Wes McKinney a écrit :
> Implementing the format fully requires memory management and IO
> interfaces (i.e. arrow/io/{file.h, interfaces.h, memory.h}). So those
> parts are not separable.
> 
> On Fri, Sep 20, 2019 at 10:36 AM Neal Richardson
> <neal.p.richard...@gmail.com> wrote:
>>
>> I wonder if having a core "format" C++ library, which the io, compute,
>> etc. library/libraries would depend on, is a natural step.
>> Particularly since we're coming up on 1.0 and the format is being
>> declared stable.
>>
>> Neal
>>
>> On Fri, Sep 20, 2019 at 8:28 AM Wes McKinney <wesmck...@gmail.com> wrote:
>>>
>>> We would have to be even more careful about managing symbol exports.
>>> Third party projects would need to link more libraries in their
>>> applications (not unlike the way that Boost works now -- I suppose
>>> that Boost is the closest analogue to what we're going for)
>>>
>>> On Fri, Sep 20, 2019 at 2:30 AM Micah Kornfield <emkornfi...@gmail.com> 
>>> wrote:
>>>>>
>>>>> We could indeed split up libarrow into more shared libraries. This
>>>>> would mean accepting a lot more maintenance effort though, on a team
>>>>> that is already overburdened. I'm not too keen on that in the short
>>>>> term.
>>>>
>>>>
>>>> Something for longer term to think about.  What are you seeing as the 
>>>> added maintenance here?
>>>>
>>>>
>>>> On Thu, Sep 19, 2019 at 5:38 PM Wes McKinney <wesmck...@gmail.com> wrote:
>>>>>
>>>>> hi Micah,
>>>>>
>>>>>
>>>>> On Thu, Sep 19, 2019 at 12:41 AM Micah Kornfield <emkornfi...@gmail.com> 
>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> * Should optional components be "opt in", "out out", or a mix?
>>>>>>> Currently it's a mix, and that's confusing for people. I think we
>>>>>>> should make them all "opt in".
>>>>>>
>>>>>> Agreed they should all be opt in by default.  I think active developer 
>>>>>> are
>>>>>> quite adept at flipping the appropriate CMake flags.
>>>>>>
>>>>>
>>>>> Cool. I opened a tracking JIRA
>>>>> https://issues.apache.org/jira/browse/ARROW-6637 and attached many
>>>>> issues. Sorry for the new JIRA flood
>>>>>
>>>>>>
>>>>>>> * Do we want to bring the out-of-the-box core build down to zero
>>>>>>> dependencies, including not depending on boost::filesystem and
>>>>>>> possibly checking the compiled Flatbuffers files.
>>>>>>
>>>>>>  While it may be
>>>>>>> slightly more maintenance work, I think the optics of a
>>>>>>> "dependency-free" core build would be beneficial and help the project
>>>>>>> marketing-wise.
>>>>>>
>>>>>> I'm -.5 on checking in generated artifacts but this is mostly stylistic.
>>>>>> In the case of flatbuffers it seems like we might be able to get-away 
>>>>>> with
>>>>>> vendoring since it should mostly be headers only.
>>>>>>
>>>>>> I would prefer to try come up with more granular components and be
>>>>>> very conservative on what is "core".  I think it should be possible have 
>>>>>> a
>>>>>> zero dependency build if only MemoryPool, Buffers, Arrays and 
>>>>>> ArrayBuilders
>>>>>> in a core package [1].  This combined with discussion Antoine started on 
>>>>>> an
>>>>>> ABI compatible C-layer would make basic inter-op within a process
>>>>>> reasonable.  Moving up the stack to IPC and files, there is probably a 
>>>>>> way
>>>>>> to package headers separately from implementations.  This would allow 
>>>>>> other
>>>>>> projects wishing to integrate with Arrow to bring their own 
>>>>>> implementations
>>>>>> without the baggage of boost::filesystem. Would this leave anything 
>>>>>> besides
>>>>>> "flatbuffers" as a hard dependency to support IPC?
>>>>>>
>>>>>
>>>>> We could indeed split up libarrow into more shared libraries. This
>>>>> would mean accepting a lot more maintenance effort though, on a team
>>>>> that is already overburdened. I'm not too keen on that in the short
>>>>> term.
>>>>>
>>>>>> Thanks,
>>>>>> Micah
>>>>>>
>>>>>>
>>>>>> [1] It probably makes sense to go even further and separate out 
>>>>>> MemoryPool
>>>>>> and Buffer, so we can break the circular relationship between parquet and
>>>>>> arrow.
>>>>>
>>>>> Don't think this is possible even then, particularly in light of my
>>>>> recent work reading and writing Arrow columnar data "closer to the
>>>>> metal"  inside Parquet, yielding beneficial performance improvements.
>>>>>
>>>>>>
>>>>>> On Wed, Sep 18, 2019 at 8:03 AM Wes McKinney <wesmck...@gmail.com> wrote:
>>>>>>
>>>>>>> To be clear I think we should make these changes right after 0.15.0 is
>>>>>>> released so we aren't playing whackamole with our packaging scripts.
>>>>>>> I'm happy to take the lead on the work...
>>>>>>>
>>>>>>> On Wed, Sep 18, 2019 at 9:54 AM Antoine Pitrou <solip...@pitrou.net>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Wed, 18 Sep 2019 09:46:54 -0500
>>>>>>>> Wes McKinney <wesmck...@gmail.com> wrote:
>>>>>>>>> I think these are both interesting areas to explore further. I'd like
>>>>>>>>> to focus on the couple of immediate items I think we should address
>>>>>>>>>
>>>>>>>>> * Should optional components be "opt in", "out out", or a mix?
>>>>>>>>> Currently it's a mix, and that's confusing for people. I think we
>>>>>>>>> should make them all "opt in".
>>>>>>>>> * Do we want to bring the out-of-the-box core build down to zero
>>>>>>>>> dependencies, including not depending on boost::filesystem and
>>>>>>>>> possibly checking the compiled Flatbuffers files. While it may be
>>>>>>>>> slightly more maintenance work, I think the optics of a
>>>>>>>>> "dependency-free" core build would be beneficial and help the project
>>>>>>>>> marketing-wise.
>>>>>>>>>
>>>>>>>>> Both of these issues must be addressed whether we undertake a Bazel
>>>>>>>>> implementation or some other refactor of the C++ build system.
>>>>>>>>
>>>>>>>> I think checking in the Flatbuffers files (and also Protobuf and Thrift
>>>>>>>> where applicable :-)) would be fine.
>>>>>>>>
>>>>>>>> As for boost::filesystem, getting rid of it wouldn't be a huge task.
>>>>>>>> Still worth deciding whether we want to prioritize development time for
>>>>>>>> it, because it's not entirely trivial either.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>> Antoine.
>>>>>>>>
>>>>>>>>
>>>>>>>

Reply via email to