lidavidm commented on code in PR #34420:
URL: https://github.com/apache/arrow/pull/34420#discussion_r1132879431
##########
cpp/src/arrow/filesystem/filesystem.h:
##########
@@ -476,6 +476,25 @@ Result<std::shared_ptr<FileSystem>>
FileSystemFromUri(const std::string& uri,
const io::IOContext&
io_context,
std::string* out_path =
NULLPTR);
+/// \brief Ensure a URI (or path) is compatible with the given filesystem and
return the
+/// path
+///
+/// \param filesystem A filesystem that should be capable of fetching the URI
+/// \param uri A URI representing a resource in the given filesystem.
+///
+/// This method will check to ensure the given filesystem is compatible with
the
+/// URI. If that behavior is not desired then NULLPTR can be specified for the
+/// filesystem and that check will be skipped.
+///
+/// uri can be an absolute path instead of a URI. In that case it will ensure
the
+/// filesystem (if supplied) is the local filesystem and will normalize the
path's
+/// file separators.
+///
+/// \return The path inside the filesystem that is indicated by the URI.
+ARROW_EXPORT
+Result<std::string> PathFromUriOrPath(const FileSystem* filesystem,
Review Comment:
nit: `const FileSystem&`?
##########
cpp/src/arrow/filesystem/filesystem.cc:
##########
@@ -738,8 +738,57 @@ Result<std::shared_ptr<FileSystem>>
FileSystemFromUriReal(const Uri& uri,
return Status::Invalid("Unrecognized filesystem type in URI: ", uri_string);
}
+Status CheckFileSystem(const FileSystem* filesystem,
+ const std::string& expected_type_name, const
std::string& uri) {
+ if (filesystem && filesystem->type_name() != expected_type_name) {
+ return Status::Invalid("URI ", uri, " is not compatible with filesystem of
type ",
+ filesystem->type_name());
+ }
+ return Status::OK();
+}
+
} // namespace
+Result<std::string> PathFromUriOrPath(const FileSystem* filesystem,
+ const std::string& uri_string) {
+ if (internal::DetectAbsolutePath(uri_string)) {
+ RETURN_NOT_OK(CheckFileSystem(filesystem, "local", uri_string));
+ // Normalize path separators
+ return ToSlashes(uri_string);
+ }
+
+ ARROW_ASSIGN_OR_RAISE(auto uri, ParseFileSystemUri(uri_string));
+ const auto scheme = uri.scheme();
+ std::string path;
+ if (scheme == "file") {
+ RETURN_NOT_OK(CheckFileSystem(filesystem, "local", uri_string));
+ RETURN_NOT_OK(LocalFileSystemOptions::FromUri(uri, &path));
+ } else if (scheme == "gs" || scheme == "gcs") {
+ RETURN_NOT_OK(CheckFileSystem(filesystem, "gcs", uri_string));
+#ifdef ARROW_GCS
+ RETURN_NOT_OK(GcsOptions::FromUri(uri, &path));
+#else
+ return Status::NotImplemented("Got GCS URI but Arrow compiled without GCS
support");
+#endif
+ } else if (scheme == "hdfs" || scheme == "viewfs") {
+ RETURN_NOT_OK(CheckFileSystem(filesystem, "hdfs", uri_string));
+ path = uri.path();
+ } else if (scheme == "s3") {
+ RETURN_NOT_OK(CheckFileSystem(filesystem, "s3", uri_string));
+#ifdef ARROW_S3
+ RETURN_NOT_OK(S3Options::FromUri(uri, &path));
Review Comment:
The other potential problem is, what if the URI specifies a filesystem that
isn't compatible beyond just the type? For instance, we can stuff credentials,
endpoint overrides, etc. into the URI - this won't detect that (that may be
fine so long as it's clear what "compatible" means).
In that case, though, maybe it's best to have this function do nothing other
than turn the URI into a path, and let the application deal with any other
logic.
##########
cpp/src/arrow/filesystem/filesystem.cc:
##########
@@ -738,8 +738,57 @@ Result<std::shared_ptr<FileSystem>>
FileSystemFromUriReal(const Uri& uri,
return Status::Invalid("Unrecognized filesystem type in URI: ", uri_string);
}
+Status CheckFileSystem(const FileSystem* filesystem,
+ const std::string& expected_type_name, const
std::string& uri) {
+ if (filesystem && filesystem->type_name() != expected_type_name) {
+ return Status::Invalid("URI ", uri, " is not compatible with filesystem of
type ",
+ filesystem->type_name());
+ }
+ return Status::OK();
+}
+
} // namespace
+Result<std::string> PathFromUriOrPath(const FileSystem* filesystem,
Review Comment:
I suppose this feels like it should be a virtual method on FileSystem to me,
still. (Probably with a default implementation that returns NotImplemented.) It
would avoid the ifdef tower, and would keep FileSystem as an actual extension
point, avoiding a hardcoded list that we have to maintain inside of Arrow. It
effectively already is implemented as a (static) method, too.
Then again, other methods above have this same pattern, so I suppose it's
consistent.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]