westonpace commented on code in PR #35440:
URL: https://github.com/apache/arrow/pull/35440#discussion_r1243016888
##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -2116,28 +2083,86 @@ class S3FileSystem::Impl : public
std::enable_shared_from_this<S3FileSystem::Imp
return DeleteObjectsAsync(bucket, keys).status();
}
+ Future<> EnsureNotFileAsync(const std::string& bucket, const std::string&
key) {
+ if (key.empty()) {
+ // There is no way for a bucket to be a file
+ return Future<>::MakeFinished();
Review Comment:
This is only called in one place which then immediately calls
`DoDeleteDirContentsAsync` which does put us on the IO thread pool. However, I
also don't think you can reasonable expect that an Async method is guaranteed
to spawn a new task (regrettably).
##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -2285,6 +2301,11 @@ Result<FileInfo> S3FileSystem::GetFileInfo(const
std::string& s) {
auto outcome = impl_->client_->HeadObject(req);
if (outcome.IsSuccess()) {
+ bool ends_in_slash = s[s.size() - 1] == '/';
+ if (outcome.GetResult().GetContentLength() == 0 && ends_in_slash) {
+ info.set_type(FileType::Directory);
+ return info;
+ }
Review Comment:
It's not, I've removed it. I think I was just trying to make sense of the
way `GetFileInfo` works with directories while debugging failures. We create
these kinds of files when we create directories recursively. For example, when
we create `foo/bar/baz` recursively (in some bucket `my_bucket`) we will create
empty files `foo/` and `foo/bar/` and I was concerned we were reading these as
files.
However, if `s` is `foo/bar/` here then the trailing slash will be removed
in `S3Path::FromString`. So this newly added branch is impossible to
encounter. Instead, we will look for `foo/bar` and fail to find it. We will
then eventually decide it is a directory because we search for `foo/bar/` as
part of `IsNonEmptyDirectory`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]