MisterRaindrop commented on code in PR #703:
URL: https://github.com/apache/iceberg-cpp/pull/703#discussion_r3371439007
##########
src/iceberg/arrow/arrow_io.cc:
##########
@@ -473,9 +473,14 @@ class ArrowOutputFile : public OutputFile {
} // namespace
Result<std::string> ArrowFileSystemFileIO::ResolvePath(const std::string&
file_location) {
- if (file_location.find("://") != std::string::npos) {
- ICEBERG_ARROW_ASSIGN_OR_RETURN(auto path,
arrow_fs_->PathFromUri(file_location));
- return path;
+ if (auto pos = file_location.find("://"); pos != std::string::npos) {
+ auto path = arrow_fs_->PathFromUri(file_location);
+ if (path.ok()) {
+ return path.ValueOrDie();
+ }
+ // PathFromUri rejects S3-compatible schemes (s3a/s3n, gs://, oss://);
+ // fall back to the scheme-less bucket/key.
+ return file_location.substr(pos + 3);
Review Comment:
I think this fallback is too broad. As implemented, any URI scheme rejected
by PathFromUri will be stri
pped and interpreted as a path in the wrapped filesystem. For example,
with an S3FileSystem, a mismatch ed URI such as file://bucket/prefix would be
interpreted as bucket/prefix instead of failing fast.
Can we make the fallback explicit and narrow?
1. The scheme should be in an allowlist for the wrapped filesystem. For
S3FileSystem, I think we can
support well-known aliases such as s3a and s3n. Vendor schemes such as oss
or gs should probably be opt- in or discussed separately, because they may also
represent native OSS/GCS locations.
2. The PathFromUri failure should match the expected scheme-mismatch /
unsupported-scheme case. Other errors, such as malformed URI, authority
handling errors, or local-path errors, should still be returned to the caller.
In other words, this should be guarded by both a scheme allowlist and the
expected error condition,
rather than falling back for every PathFromUri failure.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]