danielcweeks commented on a change in pull request #1767:
URL: https://github.com/apache/iceberg/pull/1767#discussion_r523787203
##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -87,18 +168,105 @@ public void close() throws IOException {
super.close();
closed = true;
+ currentStagingFile = null;
try {
stream.close();
- s3.putObject(
-
PutObjectRequest.builder().bucket(location.bucket()).key(location.key()).build(),
- RequestBody.fromFile(stagingFile));
+ completeUploads();
} finally {
- if (!stagingFile.delete()) {
- LOG.warn("Could not delete temporary file: {}", stagingFile);
+ stagingFiles.forEach(f -> {
+ if (f.exists() && !f.delete()) {
+ LOG.warn("Could not delete temporary file: {}", f);
+ }
+ });
+ }
+ }
+
+ private void initializeMultiPartUpload() {
+ multipartUploadId =
s3.createMultipartUpload(CreateMultipartUploadRequest.builder()
+ .bucket(location.bucket()).key(location.key()).build()).uploadId();
+ }
+
+ private void uploadParts() {
+ // exit if multipart has not been initiated
+ if (multipartUploadId == null) {
+ return;
+ }
+
+ stagingFiles.stream()
+ // do not upload the file currently being written
+ .filter(f -> currentStagingFile == null ||
!currentStagingFile.equals(f))
+ // do not upload any files that have already been processed
+ .filter(Predicates.not(multiPartMap::containsKey))
+ .forEach(f -> {
+ UploadPartRequest uploadRequest = UploadPartRequest.builder()
+ .bucket(location.bucket())
+ .key(location.key())
+ .uploadId(multipartUploadId)
+ .partNumber(stagingFiles.indexOf(f) + 1)
+ .contentLength(f.length())
+ .build();
+
+ CompletableFuture<CompletedPart> future =
CompletableFuture.supplyAsync(
Review comment:
@jackye1995 the comment I was referring to about the async client was
this: https://github.com/apache/iceberg/pull/1573#discussion_r502725465
Overall, I'm not sure we benefit much from the async client in this case
because everything around is still synchronous. I feel like if we move to a
vectored-io implementation for read, that might be a better time to explore
switching to async.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]