[
https://issues.apache.org/jira/browse/ARROW-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183975#comment-17183975
]
Antoine Pitrou edited comment on ARROW-9820 at 8/25/20, 12:20 PM:
------------------------------------------------------------------
Thanks for posting this. I agree it would be a good idea to allow adding custom
filesystem implementations.
Some more comments:
1) Arrow C++ is one specific library implementing the Arrow format. Other Arrow
implementations don't necessarily provide. That said, the ones that bind around
Arrow C++ (e.g. PyArrow) generally expose the facilities that in Arrow C++.
2) If using C rather than C++ , how would we handle lifetime and ownership
issues? That sounds like a can of worms. Arrow C++ is using C++ for a reason...
(if someone OTOH wants to write a C Arrow implementation, nobody will object
:-))
3) runtime vs. compile-time: people shouldn't have to recompile Arrow C++ to
add a new filesystem type. If that's what you mean by "runtime", then let's do
that. OTOH, it doesn't have to be a "zero configuration" thing (i.e. it's ok to
have to call a registration function).
4) filesystem API stability: we can change the API assuming there are *good*
reasons to change it. But that's orthogonal to this issue, and you should open
separate JIRAs for that.
Given all this, perhaps you could tell us a bit more about what kind of plugin
API you're expecting or able to work with.
was (Author: pitrou):
Thanks for posting this. I agree it would be a good idea to allow adding custom
filesystem implementations.
Some more comments:
1) Arrow C++ is one specific library implementing the Arrow format. Other Arrow
implementations don't necessarily provide. That said, the ones that bind around
Arrow C++ (e.g. PyArrow) generally expose the facilities that in Arrow C++.
2) If using C rather than C++, how would we handle lifetime and ownership
issues? That sounds like a can of worms. Arrow C++ is using C++ for a reason...
(if someone OTOH wants to write a C Arrow implementation, nobody will object
:-))
3) runtime vs. compile-time: people shouldn't have to recompile Arrow C++ to
add a new filesystem type. If that's what you mean by "runtime", then let's do
that. OTOH, it doesn't have to be a "zero configuration" thing (i.e. it's ok to
have to call a registration function).
4) filesystem API stability: we can change the API assuming there are *good*
reasons to change it. But that's orthogonal to this issue, and you should open
separate JIRAs for that.
Given all this, perhaps you could tell us a bit more about what kind of plugin
API you're expecting or able to work with.
> [C++] Plugin Architecture for Filesystem and File IO
> ----------------------------------------------------
>
> Key: ARROW-9820
> URL: https://issues.apache.org/jira/browse/ARROW-9820
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Lawrence Chan
> Priority: Minor
>
> Adding a new custom filesystem with corresponding file i/o streams is quite a
> process at the moment. Looks like HDFS and S3FS are basically hardcoded in
> many places. It would be useful to develop a plugin system to allow users to
> interface with other data stores without maintaining a permanent fork with
> hardcoded changes.
> We can either do runtime plugins or compile-time plugins. Runtime is more
> user-friendly, but with C++, ABI compatibility is fairly delicate. So we
> would either want to use a C ABI or accept a youre-on-your-own situation
> where the user is expected to be very careful with versioning and compiler
> flags.
> With compile-time plugins, maybe there's a way to have the cmake machinery
> build third party code and also register those new URI schemes automatically.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)