alamb opened a new issue, #7242: URL: https://github.com/apache/arrow-rs/issues/7242
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** This keeps coming up in various contexts so I wanted to make an issue with a clear description of what is going on rather than having it spread out in comments on various PRs / tickets TLDR if a request fails mid-stream (after we begin to read data) it is not retried and instead the error is returned As @crepererum on https://github.com/apache/arrow-rs/issues/5882#issuecomment-2700954147 : > So long store short: People agree that this would be a good feature to have, but it requires a proper implementation. ## Streaming ✅ Some APIs like [`ObjectStore::get`](https://docs.rs/object_store/latest/object_store/trait.ObjectStore.html#method.get) are "streaming" in the sense that they start returning data as soon as it comes back from the network (as opposed to buffering the response before returning to the caller) This is great for performance as response processing can happen immediately and limits memory usage for large payloads 🏆 ## Retries ✅ In order to deal with the intermittent errors that occur processing object store requests, most ObjectStore implementations retry the request if they encounter error (see [retry.rs](https://github.com/apache/arrow-rs/blob/main/object_store/src/client/retry.rs)) ## Retries + Streaming ❌ However, there is a problem when streaming is mixed with the existing retries. Specifically, if a request fails mid-stream (after some, but not all, of the data has been returned to the client), just retrying the entire request isn't enough because then the client would be potentially be given the same data from the start of the response that it had already been given **Describe the solution you'd like** Implementing retries for streaming reads would need something more complicated like retrying the request just for the bytes that hadn't been already read Any solution for this I think needs: 1. Very good tests / clear documentation **Describe alternatives you've considered** @crepererum suggests on https://github.com/apache/arrow-rs/issues/5882#issuecomment-2700954147 : > retrying would need to make a new request with a new range starting after the last received byte and ideally also an ETAG/version check to ensure that the object that is returned by the retry is the the one that was already "in flight". This retry mechanic is obviously chaining/nested, i.e. if the retry fails mid-stream, you wanna have yet another retry that picks up the where the previous one ended. **Additional context** - https://github.com/apache/arrow-rs/issues/6287 - https://github.com/apache/arrow-rs/pull/6519 - https://github.com/apache/arrow-rs/issues/5882 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org