[ 
https://issues.apache.org/jira/browse/ARROW-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183975#comment-17183975
 ] 

Antoine Pitrou commented on ARROW-9820:
---------------------------------------

Thanks for posting this. I agree it would be a good idea to allow adding custom 
filesystem implementations.

Some more comments:
1) Arrow C++ is one specific library implementing the Arrow format. Other Arrow 
implementations don't necessarily provide. That said, the ones that bind around 
Arrow C++ (e.g. PyArrow) generally expose the facilities that in Arrow C++.
2) If using C rather than C++, how would we handle lifetime and ownership 
issues? That sounds like a can of worms. Arrow C++ is using C++ for a reason... 
(if someone OTOH wants to write a C Arrow implementation, nobody will object 
:-))
3) runtime vs. compile-time: people shouldn't have to recompile Arrow C++ to 
add a new filesystem type. If that's what you mean by "runtime", then let's do 
that. OTOH, it doesn't have to be a "zero configuration" thing (i.e. it's ok to 
have to call a registration function).
4) filesystem API stability: we can change the API assuming there are *good* 
reasons to change it. But that's orthogonal to this issue, and you should open 
separate JIRAs for that.

Given all this, perhaps you could tell us a bit more about what kind of plugin 
API you're expecting or able to work with.

> [C++] Plugin Architecture for Filesystem and File IO
> ----------------------------------------------------
>
>                 Key: ARROW-9820
>                 URL: https://issues.apache.org/jira/browse/ARROW-9820
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Lawrence Chan
>            Priority: Minor
>
> Adding a new custom filesystem with corresponding file i/o streams is quite a 
> process at the moment.  Looks like HDFS and S3FS are basically hardcoded in 
> many places.  It would be useful to develop a plugin system to allow users to 
> interface with other data stores without maintaining a permanent fork with 
> hardcoded changes.
> We can either do runtime plugins or compile-time plugins.  Runtime is more 
> user-friendly, but with C++, ABI compatibility is fairly delicate.  So we 
> would either want to use a C ABI or accept a youre-on-your-own situation 
> where the user is expected to be very careful with versioning and compiler 
> flags.
> With compile-time plugins, maybe there's a way to have the cmake machinery 
> build third party code and also register those new URI schemes automatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to