danielcweeks commented on a change in pull request #3813:
URL: https://github.com/apache/iceberg/pull/3813#discussion_r790323941
##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -74,13 +79,16 @@
private final AwsProperties awsProperties;
private CountingOutputStream stream;
- private final List<File> stagingFiles = Lists.newArrayList();
+ private final List<FileAndDigest> stagingFiles = Lists.newArrayList();
private final File stagingDirectory;
private File currentStagingFile;
private String multipartUploadId;
private final Map<File, CompletableFuture<CompletedPart>> multiPartMap =
Maps.newHashMap();
private final int multiPartSize;
private final int multiPartThresholdSize;
+ private final boolean isEtagCheckEnabled;
+ private final MessageDigest completeMessageDigest;
+ private final MessageDigest currentPartMessageDigest;
Review comment:
I don't think we want to try to recalculate if it requires rereading the
file in order to calculate the checksum. I still feel like pushing the per
file digest into the FileAndDigest option is the best approach and keep the
full file digest for the put object upload as well. I think we're over
optimizing by trying to stop the full file calculation and we could always log
the full file MD5.
There's another option that might simplify things but would require a larger
change involving uploads. If we simply make the first part of the multipart
threshold larger (multiplart+threshold) then each FileAndDigest would have a
digest associated with it and could be used for the put object or put part
uploads.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]