bkietz commented on code in PR #39067:
URL: https://github.com/apache/arrow/pull/39067#discussion_r1476489080


##########
cpp/src/arrow/filesystem/filesystem.cc:
##########
@@ -738,6 +761,109 @@ Result<std::shared_ptr<FileSystem>> 
FileSystemFromUriReal(const Uri& uri,
 
 }  // namespace
 
+Status RegisterFileSystemFactory(std::vector<std::string> schemes,
+                                 FileSystem::Factory factory) {
+  auto& [mutex, scheme_to_factory] = FileSystemFactoryRegistry();
+  std::unique_lock lock{mutex};
+
+  for (auto&& scheme : schemes) {
+    if (auto& ref = scheme_to_factory[scheme]) {
+      return Status::KeyError(
+          "Tried to add a factory for ", scheme,
+          ":// uris, but a factory was already registered for that scheme");
+    } else {
+      ref = factory;
+    }
+  }
+  return Status::OK();
+}
+
+// XXX alternative approach:
+// If we distrust global constructors more than we distrust
+// symbol visibility, we could export specially named functions
+// and retrieve them via dlsym.
+Status RegisterFileSystemFactoryModule(std::string module_name) {
+  static auto* handle = dlopen(nullptr, RTLD_NOW | RTLD_LOCAL);
+  if (!handle) {
+    return Status::Invalid("dlopen failed: ", dlerror());
+  }
+
+  dlerror();
+
+  module_name = "ArrowFileSystemModule_" + module_name;

Review Comment:
   @kou
   > I think that it's convenient that FileSystemFromUri("s3://...") loads 
S3FileSystem automatically.
   
   This can be accomplished for built-in filesystems as long as `libarrow.so` 
can derive the location and name of `libarrow_fs_s3.so`. However I think it's 
inevitable that some installations will render these libraries difficult to 
find, in which case the paths will need to be explicitly provided to 
`libarrow.so` by the consumer.
   
   > But it's difficult to finalize automatically loaded file system 
implementations... For example, users can't call arrow::fs::EnsureS3Finalized() 
with this API...
   
   Initialization of filesystems can be handled in the factory on the first 
call. Part of this initialization could be appending the implementation's 
finalizer to a list, all of which are invoked by `arrow::fs::Finalize()`. I 
think this is actually cleaner than requiring users to individually finalize 
each filesystem they use.
   
   > But this will not work with static library only build.
   
   One advantage of using global constructors (registrars) is that as long as 
the binary which contains the definition is loaded, the factories will be 
registered with no further intervention by the consumer. If the S3 filesystem 
is dynamically loaded then the factory will be registered before `dlopen()` 
returns. Otherwise (for example if `libarrow_fs_s3.a` is statically linked to 
the consumer) then the factory will be registered before `main()` is entered.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to