jorisvandenbossche opened a new issue, #38309:
URL: https://github.com/apache/arrow/issues/38309

   Currently all our supported file systems in C++ (local, hdfs, s3, google 
cloud (gs/gcs), and soon azure (abfs)) are incorporated in the core libarrow 
library. For example, when enabled they are hardcoded in 
[`FileSystemFromUri`](https://github.com/apache/arrow/blob/a5043e710939e7691bdd57087bf475df3bc0aa48/cpp/src/arrow/filesystem/filesystem.cc#L680-L724).
   
   There is a desire to be able to separate those filesystems in their own 
libraries, such that they can be installed separately. The remote filesystems 
each come with their own (potentially quite large) dependencies, and one 
typically doesn't need all of them at the same time. 
   More generally, it might also be nice that filesystems _can_ be implemented 
externally for filesystems we wouldn't consider including in the main arrow 
project.
   
   My understanding is that this would require:
   
   * Some mechanism to "register" the filesystem to the core libarrow fs 
utilities. Especially the parsing from URI (i.e. when the user doesn't pass an 
already instantiated filesystem object) needs to know for each prefix to 
dispatch to which filesystem implementation.
   * Build some or our own filesystems as separate libraries. I think this 
would mainly be the cloud filesystems (s3, google cloud, azure), while the 
local filesystem would always be included in core libarrow (+ some of the 
composite ones like subtree). 
   
   cc @pitrou @zeroshade 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to