sarum90 opened a new pull request, #609:
URL: https://github.com/apache/arrow-rs-object-store/pull/609

   # Which issue does this PR close?
   
   N/A - no existing issue. Happy to create one if preferred.
   
   # Rationale for this change
   
   If a reqwest timeout occurs after consuming all bytes from an S3 response 
body but before receiving the EOF signal, `retry_stream` attempts to retry with 
a `Range` header like `bytes=5-4` (where 5 is the file size). This is an 
invalid/backwards range.
   
   Per RFC 7233, servers MUST ignore syntactically invalid Range headers. S3 
follows this spec and returns `200 OK` with the **full file** instead of `206 
Partial Content` or `416 Range Not Satisfiable`.
   
   Without validation, `retry_stream` would read the full file again, causing 
data duplication (e.g., `"hellohello"` instead of `"hello"`).
   
   # What changes are included in this PR?
   
   1. **Early EOF check**: If `range.start >= range.end` before retrying, 
return EOF immediately rather than sending an invalid range request.
   
   2. **206 status validation**: Verify the retry response is `206 Partial 
Content`. If the server returns `200 OK` (indicating it ignored our Range 
header), fail the retry and return the original error.
   
   3. **Content-Range validation**: Verify the `Content-Range` header matches 
the requested range. This catches cases where the server returns a different 
range than requested.
   
   4. **Warning logs**: Added `warn!` logs for the new validation failures to 
aid debugging.
   
   # Are there any user-facing changes?
   
   No breaking changes. Users may see different error behavior in edge cases 
where retries were previously succeeding incorrectly (by silently duplicating 
data).
   
   # Test Coverage
   
   Automated test coverage for this specific bug is limited for two reasons:
   
   1. **Requires real S3**: LocalStack returns `416 Range Not Satisfiable` for 
invalid ranges, which is actually non-compliant with RFC 7233. Real S3 
correctly ignores the invalid range and returns `200 OK` with the full file, 
which is what triggers the bug.
   
   2. **Hard to trigger**: The bug requires a timeout (or other error) from 
reqwest *after* all data is read but *before* EOF is signaled. This is 
timing-dependent and would require integration tests with long sleeps to force 
a timeout.
   
   A manual reproduction script is available in this gist: 
https://gist.github.com/sarum90/cb95633750390db690ef701f08aa20ed
   
   The script demonstrates the bug against real S3 and shows the data 
duplication.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to