felipecrv commented on code in PR #40147:
URL: https://github.com/apache/arrow/pull/40147#discussion_r1512869976


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -2145,7 +2166,10 @@ class S3FileSystem::Impl : public 
std::enable_shared_from_this<S3FileSystem::Imp
           child_path_ss << bucket << kSep << child_key;
           child_key = child_path_ss.str();
           if (obj.GetSize() > 0 || !had_trailing_slash) {
-            // We found a real file
+            // We found a real file.
+            // XXX Ideally, for 0-sized files we would also check the 
Content-Type
+            // against kAwsDirectoryContentType, but ListObjectsV2 does not 
give
+            // that information.

Review Comment:
   As I said above, trying to support this gets ugly very quickly. The 
explosion of conditions to check when validating operations and extra requests 
everywhere.



##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -1214,6 +1215,24 @@ Status SetObjectMetadata(const std::shared_ptr<const 
KeyValueMetadata>& metadata
   return Status::OK();
 }
 
+bool IsDirectory(std::string_view key, const S3Model::HeadObjectResult& 
result) {
+  // If it has a non-zero length, it's a regular file
+  if (result.GetContentLength() > 0) {
+    return false;
+  }

Review Comment:
   I don't think it's even possible for all `arrow::FileSystem` functions to 
deal with `/`-ending paths that are sometimes considered to be a "file". It's 
better to simply say "we do not support reading files that end in a `/` in 
`arrow::FileSystem` implementations".



##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -1214,6 +1215,24 @@ Status SetObjectMetadata(const std::shared_ptr<const 
KeyValueMetadata>& metadata
   return Status::OK();
 }
 
+bool IsDirectory(std::string_view key, const S3Model::HeadObjectResult& 
result) {
+  // If it has a non-zero length, it's a regular file
+  if (result.GetContentLength() > 0) {
+    return false;
+  }
+  // Otherwise, if it has a trailing slash, it's a directory
+  if (internal::HasTrailingSlash(key)) {
+    return true;
+  }
+  // Otherwise, if its content type starts with "application/x-directory",
+  // it's a directory
+  if (::arrow::internal::StartsWith(result.GetContentType(), 
kAwsDirectoryContentType)) {
+    return true;
+  }

Review Comment:
   This we can support: directories that don't end in a `/`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to