alamb opened a new issue, #386:
URL: https://github.com/apache/arrow-rs-object-store/issues/386
**Describe the bug**
When downloading a large enough file on a slow enough network, retrieving
streaming results from `ObjectStore::get` will still report "operation timed
out" even when it is making active progress
**To Reproduce**
```rust
use std::time::{Duration, Instant};
use futures::StreamExt;
use object_store::ObjectStore;
use object_store::path::Path;
#[tokio::main]
async fn main() {
let start = Instant::now();
let object_store_url = "https://datasets.clickhouse.com";
let client_options = object_store::ClientOptions::default()
// default the timeout to 1 second
.with_timeout(Duration::from_secs(1));
let object_store = object_store::http::HttpBuilder::new()
.with_client_options(client_options)
.with_url(object_store_url)
.build()
.unwrap();
// this is a 14GB file
let file_path = Path::from("hits_compatible/hits.parquet");
let response = object_store.get(&file_path).await.unwrap();
// read the response body relatively slowly
let mut stream = response.into_stream();
while let Some(chunk) = stream.next().await {
let chunk = chunk.unwrap();
// throttle the read speed
tokio::time::sleep(Duration::from_millis(100)).await;
}
println!("{:?} Done", start.elapsed());
}
```
Results in
```
thread 'main' panicked at src/main.rs:29:27:
called `Result::unwrap()` on an `Err` value: Generic { store: "HTTP",
source: HttpError { kind: Timeout, source: reqwest::Error { kind: Body, source:
reqwest::Error { kind: Decode, source: reqwest::Error { kind: Body, source:
TimedOut } } } } }
stack backtrace:
...
````
**Expected behavior**
As long as the client is consuming data and the server doesn't shut the
connection, I expect that the program will successfully complete and read the
entire file.
**Additional context**
The fix for https://github.com/apache/arrow-rs-object-store/issues/15 from
@tustvold makes this much better (now the first 10 timeouts are retried) but
eventually the timeout still happens
**Potential ideas **
1. Reset the retry counter once any data has been successfully read from the
result stream
2. Have separate timeout / retry policies that are applied to timeout errors
specifically
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]