Akshat-Jain commented on code in PR #16481:
URL: https://github.com/apache/druid/pull/16481#discussion_r1616625803
##########
extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/output/RetryableS3OutputStream.java:
##########
@@ -271,50 +328,72 @@ public void close() throws IOException
// This should be emitted as a metric
LOG.info(
"Pushed total [%d] parts containing [%d] bytes in [%d]ms.",
- numChunksPushed,
- resultsSize,
+ numChunksPushed.get(),
+ resultsSize.get(),
pushStopwatch.elapsed(TimeUnit.MILLISECONDS)
);
});
- closer.register(() ->
org.apache.commons.io.FileUtils.forceDelete(chunkStorePath));
+ try (Closer ignored = closer) {
+ if (!error) {
+ pushCurrentChunk();
+ completeMultipartUpload();
+ }
+ }
+ }
- closer.register(() -> {
- try {
- if (resultsSize > 0 && isAllPushSucceeded()) {
- RetryUtils.retry(
- () -> s3.completeMultipartUpload(
- new CompleteMultipartUploadRequest(config.getBucket(),
s3Key, uploadId, pushResults)
- ),
- S3Utils.S3RETRY,
- config.getMaxRetry()
- );
- } else {
- RetryUtils.retry(
- () -> {
- s3.cancelMultiPartUpload(new
AbortMultipartUploadRequest(config.getBucket(), s3Key, uploadId));
- return null;
- },
- S3Utils.S3RETRY,
- config.getMaxRetry()
- );
+ private void completeMultipartUpload()
+ {
+ synchronized (fileLock) {
+ while (pendingFiles.get() > 0) {
+ try {
+ LOG.info("Waiting for lock for completing multipart task for
uploadId [%s].", uploadId);
Review Comment:
Just to clarify: This is one log line per file, not per "part".
Each worker uploads its stage output (comprising of several parts, less than
10 in whatever testing I did) and a success file.
So if we have 10 workers, 3 stages, 5 parts per stage, this line will be
roughly logged around 10*3*5 + 10*3*1 == 150 part files + 30 success files ==
180 times for such a query.
This doesn't seem like a huge number to me. I felt it might help in some
debugging situations down the line. Thoughts?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]