felipecrv commented on code in PR #41232:
URL: https://github.com/apache/arrow/pull/41232#discussion_r1567761961


##########
cpp/src/arrow/filesystem/gcsfs.cc:
##########
@@ -200,6 +201,16 @@ class GcsOutputStream : public arrow::io::OutputStream {
     return internal::ToArrowStatus(stream_.last_status());
   }
 
+  Future<> CloseAsync() override {
+    if (closed_) return Status::OK();
+
+    auto self = std::dynamic_pointer_cast<GcsOutputStream>(shared_from_this());

Review Comment:
   Can't you do this without the cast here? In the body of the lambda you can 
cast the `self.get()` pointer to `GcsOutputStream` to de-virtualize the call.



##########
cpp/src/arrow/filesystem/gcsfs.cc:
##########
@@ -200,6 +201,16 @@ class GcsOutputStream : public arrow::io::OutputStream {
     return internal::ToArrowStatus(stream_.last_status());
   }
 
+  Future<> CloseAsync() override {
+    if (closed_) return Status::OK();
+
+    auto self = std::dynamic_pointer_cast<GcsOutputStream>(shared_from_this());
+    auto deferred = [self = std::move(self)]() -> Status { return 
self->Close(); };

Review Comment:
   This class is not thread-safe, so the fact that the current thread is 
handing a shared_ptr to another thread can only be safe if the caller of this 
function forgets about all the shared pointers to this class after this call.
   
   I wonder how many potential bugs we could find by adding the following 
WARNING:
   
   ```cpp
    Future<> CloseAsync() override {
      auto self = shared_from_this();
      if (self.use_count() > 2) {
        ARROW_LOG(WARNING) << "Too many strong references to GcsOutputStream 
before CloseAsync() call";
      }
      if (closed_) return Status::OK();
      ...
   ```
   
   In other words, the caller of `CloseAsync` should have unique ownership of 
`CloseAsync` so it can give the IO thread unique ownership for the `Close` call.
   
   Without this, every use of the class after `CloseAsync` call is a data race. 
Thankfully, the shared_ptr destructor checks an atomic ref count, so there is 
not risk of double destructor calls.



##########
cpp/src/arrow/filesystem/gcsfs.cc:
##########
@@ -233,6 +244,7 @@ class GcsOutputStream : public arrow::io::OutputStream {
 
  private:
   gcs::ObjectWriteStream stream_;
+  const io::IOContext io_context_;

Review Comment:
   I don't know if this matters now, but since the context might have indirect 
references to `stream_`, it's better to put it before.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to