OliLay commented on code in PR #41564:
URL: https://github.com/apache/arrow/pull/41564#discussion_r1601402884
##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -1606,42 +1656,60 @@ class ObjectOutputStream final : public
io::OutputStream {
return Status::OK();
}
- ARROW_ASSIGN_OR_RAISE(auto client_lock, holder_->Lock());
+ if (is_multipart_created_) {
+ ARROW_ASSIGN_OR_RAISE(auto client_lock, holder_->Lock());
- S3Model::AbortMultipartUploadRequest req;
- req.SetBucket(ToAwsString(path_.bucket));
- req.SetKey(ToAwsString(path_.key));
- req.SetUploadId(upload_id_);
+ S3Model::AbortMultipartUploadRequest req;
+ req.SetBucket(ToAwsString(path_.bucket));
+ req.SetKey(ToAwsString(path_.key));
+ req.SetUploadId(upload_id_);
- auto outcome = client_lock.Move()->AbortMultipartUpload(req);
- if (!outcome.IsSuccess()) {
- return ErrorToStatus(
- std::forward_as_tuple("When aborting multiple part upload for key
'", path_.key,
- "' in bucket '", path_.bucket, "': "),
- "AbortMultipartUpload", outcome.GetError());
+ auto outcome = client_lock.Move()->AbortMultipartUpload(req);
+ if (!outcome.IsSuccess()) {
+ return ErrorToStatus(
+ std::forward_as_tuple("When aborting multiple part upload for key
'",
+ path_.key, "' in bucket '", path_.bucket,
"': "),
+ "AbortMultipartUpload", outcome.GetError());
+ }
}
+
current_part_.reset();
holder_ = nullptr;
closed_ = true;
+
return Status::OK();
}
// OutputStream interface
+ bool ShouldBeMultipartUpload() { return pos_ >
kMultiPartUploadThresholdSize; }
+
+ bool IsMultipartUpload() { return ShouldBeMultipartUpload() ||
is_multipart_created_; }
+
Status EnsureReadyToFlushFromClose() {
- if (current_part_) {
- // Upload last part
- RETURN_NOT_OK(CommitCurrentPart());
- }
+ if (IsMultipartUpload()) {
Review Comment:
Can you elaborate more? I think we're calling `EnsureReadyToFlushFromClose`
just to flush the data if we close the stream, hence sometimes we still have
data in-flight that has to be uploaded (the "last part" which we commit in this
branch, or a dummy part if no data was written at all).
##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -1606,42 +1656,60 @@ class ObjectOutputStream final : public
io::OutputStream {
return Status::OK();
}
- ARROW_ASSIGN_OR_RAISE(auto client_lock, holder_->Lock());
+ if (is_multipart_created_) {
+ ARROW_ASSIGN_OR_RAISE(auto client_lock, holder_->Lock());
- S3Model::AbortMultipartUploadRequest req;
- req.SetBucket(ToAwsString(path_.bucket));
- req.SetKey(ToAwsString(path_.key));
- req.SetUploadId(upload_id_);
+ S3Model::AbortMultipartUploadRequest req;
+ req.SetBucket(ToAwsString(path_.bucket));
+ req.SetKey(ToAwsString(path_.key));
+ req.SetUploadId(upload_id_);
- auto outcome = client_lock.Move()->AbortMultipartUpload(req);
- if (!outcome.IsSuccess()) {
- return ErrorToStatus(
- std::forward_as_tuple("When aborting multiple part upload for key
'", path_.key,
- "' in bucket '", path_.bucket, "': "),
- "AbortMultipartUpload", outcome.GetError());
+ auto outcome = client_lock.Move()->AbortMultipartUpload(req);
+ if (!outcome.IsSuccess()) {
+ return ErrorToStatus(
+ std::forward_as_tuple("When aborting multiple part upload for key
'",
+ path_.key, "' in bucket '", path_.bucket,
"': "),
+ "AbortMultipartUpload", outcome.GetError());
+ }
}
+
current_part_.reset();
holder_ = nullptr;
closed_ = true;
+
return Status::OK();
}
// OutputStream interface
+ bool ShouldBeMultipartUpload() { return pos_ >
kMultiPartUploadThresholdSize; }
+
+ bool IsMultipartUpload() { return ShouldBeMultipartUpload() ||
is_multipart_created_; }
Review Comment:
1. Just my personal style. (also, with the added debug assert we now also
have to call `ShouldBeMultipartUpload` separately.
2. Made them const in
[1e9e3a4](https://github.com/apache/arrow/pull/41564/commits/1e9e3a476874b9d464517f5f8357b370e315fbed).
##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -1597,6 +1632,21 @@ class ObjectOutputStream final : public io::OutputStream
{
}
upload_id_ = outcome.GetResult().GetUploadId();
upload_state_ = std::make_shared<UploadState>();
Review Comment:
Yes, we would create the upload state twice. (basically resetting it)
I changed it in
[1e9e3a4](https://github.com/apache/arrow/pull/41564/commits/1e9e3a476874b9d464517f5f8357b370e315fbed)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]