westonpace commented on code in PR #35440:
URL: https://github.com/apache/arrow/pull/35440#discussion_r1243016888


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -2116,28 +2083,86 @@ class S3FileSystem::Impl : public 
std::enable_shared_from_this<S3FileSystem::Imp
     return DeleteObjectsAsync(bucket, keys).status();
   }
 
+  Future<> EnsureNotFileAsync(const std::string& bucket, const std::string& 
key) {
+    if (key.empty()) {
+      // There is no way for a bucket to be a file
+      return Future<>::MakeFinished();

Review Comment:
   This is only called in one place which then immediately calls 
`DoDeleteDirContentsAsync` which does put us on the IO thread pool.  However, I 
also don't think you can reasonable expect that an Async method is guaranteed 
to spawn a new task (regrettably).



##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -2285,6 +2301,11 @@ Result<FileInfo> S3FileSystem::GetFileInfo(const 
std::string& s) {
 
     auto outcome = impl_->client_->HeadObject(req);
     if (outcome.IsSuccess()) {
+      bool ends_in_slash = s[s.size() - 1] == '/';
+      if (outcome.GetResult().GetContentLength() == 0 && ends_in_slash) {
+        info.set_type(FileType::Directory);
+        return info;
+      }

Review Comment:
   It's not, I've removed it.  I think I was just trying to make sense of the 
way `GetFileInfo` works with directories while debugging failures.  We create 
these kinds of files when we create directories recursively.  For example, when 
we create `foo/bar/baz` recursively (in some bucket `my_bucket`) we will create 
empty files `foo/` and `foo/bar/` and I was concerned we were reading these as 
files.
   
   However, if `s` is `foo/bar/` here then the trailing slash will be removed 
in `S3Path::FromString`.  So this newly added branch is impossible to 
encounter.  Instead, we will look for `foo/bar` and fail to find it.  We will 
then eventually decide it is a directory because we search for `foo/bar/` as 
part of `IsNonEmptyDirectory`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to