bkietz commented on code in PR #39067:
URL: https://github.com/apache/arrow/pull/39067#discussion_r1476489080
##########
cpp/src/arrow/filesystem/filesystem.cc:
##########
@@ -738,6 +761,109 @@ Result<std::shared_ptr<FileSystem>>
FileSystemFromUriReal(const Uri& uri,
} // namespace
+Status RegisterFileSystemFactory(std::vector<std::string> schemes,
+ FileSystem::Factory factory) {
+ auto& [mutex, scheme_to_factory] = FileSystemFactoryRegistry();
+ std::unique_lock lock{mutex};
+
+ for (auto&& scheme : schemes) {
+ if (auto& ref = scheme_to_factory[scheme]) {
+ return Status::KeyError(
+ "Tried to add a factory for ", scheme,
+ ":// uris, but a factory was already registered for that scheme");
+ } else {
+ ref = factory;
+ }
+ }
+ return Status::OK();
+}
+
+// XXX alternative approach:
+// If we distrust global constructors more than we distrust
+// symbol visibility, we could export specially named functions
+// and retrieve them via dlsym.
+Status RegisterFileSystemFactoryModule(std::string module_name) {
+ static auto* handle = dlopen(nullptr, RTLD_NOW | RTLD_LOCAL);
+ if (!handle) {
+ return Status::Invalid("dlopen failed: ", dlerror());
+ }
+
+ dlerror();
+
+ module_name = "ArrowFileSystemModule_" + module_name;
Review Comment:
> I think that it's convenient that FileSystemFromUri("s3://...") loads
S3FileSystem automatically.
This can be accomplished for built-in filesystems as long as `libarrow.so`
can derive the location and name of `libarrow_fs_s3.so`. However I think it's
inevitable that some installations will render these libraries difficult to
find, in which case the paths will need to be explicitly provided to
`libarrow.so` by the consumer.
> But it's difficult to finalize automatically loaded file system
implementations... For example, users can't call arrow::fs::EnsureS3Finalized()
with this API...
Initialization of filesystems can be handled in the factory on the first
call. Part of this initialization could be appending the implementation's
finalizer to a list, all of which are invoked by `arrow::fs::Finalize()`. I
think this is actually cleaner than requiring users to individually finalize
each filesystem they use.
> But this will not work with static library only build.
One advantage of using global constructors (registrars) is that as long as
the binary which contains the definition is loaded, the factories will be
registered with no further intervention by the consumer. If the S3 filesystem
is dynamically loaded then the factory will be registered before `dlopen()`
returns. Otherwise (for example if `libarrow_fs_s3.a` is statically linked to
the consumer) then the factory will be registered before `main()` is entered.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]