[
https://issues.apache.org/jira/browse/ARROW-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184682#comment-17184682
]
Lawrence Chan commented on ARROW-9820:
--------------------------------------
I agree lifetimes with C-based plugins require some care to get correct, but I
think it is something we can design to be relatively safe for the end user. I
have some work in progress that I can push up to a PR draft and it may be
easier to discuss with some code in hand. The general gist of it is that
anything allocated by the plugin will be immediately wrapped in safer C++
owning objects that will handle destruction. There will also be ABI versioning
so that we have an upgrade path for future backwards-incompatible changes that
are safe from dangerous ABI mismatches. I think some of this will be more
clear once I get that PR pushed up.
For context about our use case: we have an in-house data storage system that
can read/write files via a userspace library, and it has a fair amount of
overlap with arrow::fs stuff in spirit. I wrote OutputStream +
RandomAccessFile subclasses and got the I/O working fine, but once I started
looking at the pyarrow bindings and the dataset stuff I realized the other
required changes would need to be hardcoded in a way that will be very
difficult for me to maintain down the road, so I started thinking about
pluggable storage drivers.
> [C++] Plugin Architecture for Filesystem and File IO
> ----------------------------------------------------
>
> Key: ARROW-9820
> URL: https://issues.apache.org/jira/browse/ARROW-9820
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Lawrence Chan
> Priority: Minor
>
> Adding a new custom filesystem with corresponding file i/o streams is quite a
> process at the moment. Looks like HDFS and S3FS are basically hardcoded in
> many places. It would be useful to develop a plugin system to allow users to
> interface with other data stores without maintaining a permanent fork with
> hardcoded changes.
> We can either do runtime plugins or compile-time plugins. Runtime is more
> user-friendly, but with C++, ABI compatibility is fairly delicate. So we
> would either want to use a C ABI or accept a youre-on-your-own situation
> where the user is expected to be very careful with versioning and compiler
> flags.
> With compile-time plugins, maybe there's a way to have the cmake machinery
> build third party code and also register those new URI schemes automatically.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)