danielcweeks commented on a change in pull request #3813:
URL: https://github.com/apache/iceberg/pull/3813#discussion_r790323941



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
##########
@@ -74,13 +79,16 @@
   private final AwsProperties awsProperties;
 
   private CountingOutputStream stream;
-  private final List<File> stagingFiles = Lists.newArrayList();
+  private final List<FileAndDigest> stagingFiles = Lists.newArrayList();
   private final File stagingDirectory;
   private File currentStagingFile;
   private String multipartUploadId;
   private final Map<File, CompletableFuture<CompletedPart>> multiPartMap = 
Maps.newHashMap();
   private final int multiPartSize;
   private final int multiPartThresholdSize;
+  private final boolean isEtagCheckEnabled;
+  private final MessageDigest completeMessageDigest;
+  private final MessageDigest currentPartMessageDigest;

Review comment:
       I don't think we want to try to recalculate if it requires rereading the 
file in order to calculate the checksum.  I still feel like pushing the per 
file digest into the FileAndDigest option is the best approach and keep the 
full file digest for the put object upload as well.  I think we're over 
optimizing by trying to stop the full file calculation and we could always log 
the full file MD5.  
   
   There's another option that might simplify things but would require a larger 
change involving uploads.  If we simply make the first part of the multipart 
threshold larger (multiplart+threshold) then each FileAndDigest would have a 
digest associated with it and could be used for the put object or put part 
uploads. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to