Re: [PR] Uses content length to determine when to abort the stream. [iceberg]

via GitHub Tue, 14 Oct 2025 09:32:48 -0700


ahmarsuhail commented on code in PR #14329:
URL: https://github.com/apache/iceberg/pull/14329#discussion_r2429790258



##########
aws/src/main/java/org/apache/iceberg/aws/s3/S3InputFile.java:
##########
@@ -76,7 +76,7 @@ public SeekableInputStream newStream() {
     if (s3FileIOProperties().isS3AnalyticsAcceleratorEnabled()) {
       return AnalyticsAcceleratorUtil.newStream(this);
     }
-    return new S3InputStream(client(), uri(), s3FileIOProperties(), metrics());
+    return new S3InputStream(client(), uri(), s3FileIOProperties(), metrics(), 
getLength());

Review Comment:
   That `read()` causes the abort to fail, as it throws the exception. 
   
   The abort() code will swallow the exception right now, but this shows up as 
   
   ```
   software.amazon.awssdk.core.exception.RetryableException: Data read has a 
different checksum than expected. Was 0x4dd4fa955ccf4a27e2f635a22948298d, but 
expected 0x00000000000000000000000000000000. This commonly means that the data 
was corrupted between the client and service.
   
    WARN S3InputStream: An error occurred while aborting the stream
   ```
   
   And this happens pretty frequently. 
   
   I'm still quite new to iceberg, but from my understanding, by this point we 
should already have the length and this shouldn't result in a new HEAD call, I 
expect:
   * Either its been passed down from the metadata from the avro metadata files
   * Or the parquet reader has already asked for it to figure out what bytes it 
needs for the footer. 
   
   I don't think this `getLength()` call will cause an extra HEAD, but could 
totally be wrong. what do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Uses content length to determine when to abort the stream. [iceberg]

Reply via email to