[ 
https://issues.apache.org/jira/browse/ARROW-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184682#comment-17184682
 ] 

Lawrence Chan commented on ARROW-9820:
--------------------------------------

I agree lifetimes with C-based plugins require some care to get correct, but I 
think it is something we can design to be relatively safe for the end user.  I 
have some work in progress that I can push up to a PR draft and it may be 
easier to discuss with some code in hand.  The general gist of it is that 
anything allocated by the plugin will be immediately wrapped in safer C++ 
owning objects that will handle destruction.  There will also be ABI versioning 
so that we have an upgrade path for future backwards-incompatible changes that 
are safe from dangerous ABI mismatches.  I think some of this will be more 
clear once I get that PR pushed up.

For context about our use case: we have an in-house data storage system that 
can read/write files via a userspace library, and it has a fair amount of 
overlap with arrow::fs stuff in spirit.  I wrote OutputStream + 
RandomAccessFile subclasses and got the I/O working fine, but once I started 
looking at the pyarrow bindings and the dataset stuff I realized the other 
required changes would need to be hardcoded in a way that will be very 
difficult for me to maintain down the road, so I started thinking about 
pluggable storage drivers.

> [C++] Plugin Architecture for Filesystem and File IO
> ----------------------------------------------------
>
>                 Key: ARROW-9820
>                 URL: https://issues.apache.org/jira/browse/ARROW-9820
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Lawrence Chan
>            Priority: Minor
>
> Adding a new custom filesystem with corresponding file i/o streams is quite a 
> process at the moment.  Looks like HDFS and S3FS are basically hardcoded in 
> many places.  It would be useful to develop a plugin system to allow users to 
> interface with other data stores without maintaining a permanent fork with 
> hardcoded changes.
> We can either do runtime plugins or compile-time plugins.  Runtime is more 
> user-friendly, but with C++, ABI compatibility is fairly delicate.  So we 
> would either want to use a C ABI or accept a youre-on-your-own situation 
> where the user is expected to be very careful with versioning and compiler 
> flags.
> With compile-time plugins, maybe there's a way to have the cmake machinery 
> build third party code and also register those new URI schemes automatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to