danielcweeks commented on a change in pull request #3813:
URL: https://github.com/apache/iceberg/pull/3813#discussion_r788357091
##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -74,13 +78,16 @@
private final AwsProperties awsProperties;
private CountingOutputStream stream;
- private final List<File> stagingFiles = Lists.newArrayList();
+ private final List<FileAndDigest> stagingFiles = Lists.newArrayList();
private final File stagingDirectory;
private File currentStagingFile;
private String multipartUploadId;
private final Map<File, CompletableFuture<CompletedPart>> multiPartMap =
Maps.newHashMap();
private final int multiPartSize;
private final int multiPartThresholdSize;
+ private final boolean isEtagCheckEnabled;
+ private final MessageDigest completeMessageDigest;
+ private MessageDigest currentPartMessageDigest;
Review comment:
Rather than computing all of the digests ourselves, it might be cleaner
to wrap the streams in `java.security.DigestOutputStream` and delegate to that
in order to calculate the etag. I'm not sure if that approach can be used for
the complete message (since the parts are uploaded independently), but I also
question whether we really need to validate the full checksum if we get all of
the parts verified.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]