felipecrv commented on code in PR #41232:
URL: https://github.com/apache/arrow/pull/41232#discussion_r1567761961
##########
cpp/src/arrow/filesystem/gcsfs.cc:
##########
@@ -200,6 +201,16 @@ class GcsOutputStream : public arrow::io::OutputStream {
return internal::ToArrowStatus(stream_.last_status());
}
+ Future<> CloseAsync() override {
+ if (closed_) return Status::OK();
+
+ auto self = std::dynamic_pointer_cast<GcsOutputStream>(shared_from_this());
Review Comment:
Can't you do this without the cast here? In the body of the lambda you can
cast the `self.get()` pointer to `GcsOutputStream` to de-virtualize the call.
##########
cpp/src/arrow/filesystem/gcsfs.cc:
##########
@@ -200,6 +201,16 @@ class GcsOutputStream : public arrow::io::OutputStream {
return internal::ToArrowStatus(stream_.last_status());
}
+ Future<> CloseAsync() override {
+ if (closed_) return Status::OK();
+
+ auto self = std::dynamic_pointer_cast<GcsOutputStream>(shared_from_this());
+ auto deferred = [self = std::move(self)]() -> Status { return
self->Close(); };
Review Comment:
This class is not thread-safe, so the fact that the current thread is
handing a shared_ptr to another thread can only be safe if the caller of this
function forgets about all the shared pointers to this class after this call.
I wonder how many potential bugs we could find by adding the following
WARNING:
```cpp
Future<> CloseAsync() override {
auto self = shared_from_this();
if (self.use_count() > 2) {
ARROW_LOG(WARNING) << "Too many strong references to GcsOutputStream
before CloseAsync() call";
}
if (closed_) return Status::OK();
...
```
In other words, the caller of `CloseAsync` should have unique ownership of
`CloseAsync` so it can give the IO thread unique ownership for the `Close` call.
Without this, every use of the class after `CloseAsync` call is a data race.
Thankfully, the shared_ptr destructor checks an atomic ref count, so there is
not risk of double destructor calls.
##########
cpp/src/arrow/filesystem/gcsfs.cc:
##########
@@ -233,6 +244,7 @@ class GcsOutputStream : public arrow::io::OutputStream {
private:
gcs::ObjectWriteStream stream_;
+ const io::IOContext io_context_;
Review Comment:
I don't know if this matters now, but since the context might have indirect
references to `stream_`, it's better to put it before.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]