alamb opened a new issue, #386:
URL: https://github.com/apache/arrow-rs-object-store/issues/386

   **Describe the bug**
   When downloading a large enough file on a slow enough network, retrieving 
streaming results from `ObjectStore::get` will still report "operation timed 
out" even when it is making active progress
   
   
   
   **To Reproduce**
   ```rust
   use std::time::{Duration, Instant};
   use futures::StreamExt;
   use object_store::ObjectStore;
   use object_store::path::Path;
   
   
   #[tokio::main]
   async fn main() {
       let start = Instant::now();
       let object_store_url = "https://datasets.clickhouse.com";;
       let client_options = object_store::ClientOptions::default()
           // default the timeout to 1 second
           .with_timeout(Duration::from_secs(1));
   
       let object_store = object_store::http::HttpBuilder::new()
           .with_client_options(client_options)
           .with_url(object_store_url)
           .build()
           .unwrap();
   
       // this is a 14GB file
       let file_path = Path::from("hits_compatible/hits.parquet");
       let response = object_store.get(&file_path).await.unwrap();
   
       // read the response body relatively slowly
       let mut stream = response.into_stream();
       while let Some(chunk) = stream.next().await {
           let chunk = chunk.unwrap();
           // throttle the read speed
           tokio::time::sleep(Duration::from_millis(100)).await;
       }
   
       println!("{:?} Done", start.elapsed());
   }
   
   ```
   
   Results in 
   ```
   thread 'main' panicked at src/main.rs:29:27:
   called `Result::unwrap()` on an `Err` value: Generic { store: "HTTP", 
source: HttpError { kind: Timeout, source: reqwest::Error { kind: Body, source: 
reqwest::Error { kind: Decode, source: reqwest::Error { kind: Body, source: 
TimedOut } } } } }
   stack backtrace:
   ...
   
   
   ````
   
   **Expected behavior**
   As long as the client is consuming data and the server doesn't shut the 
connection, I expect that the program will successfully complete and read the 
entire file. 
   
   **Additional context**
   The fix for https://github.com/apache/arrow-rs-object-store/issues/15 from 
@tustvold makes this much better (now the first 10 timeouts are retried) but 
eventually the timeout still happens
   
   **Potential ideas **
   
   1. Reset the retry counter once any data has been successfully read from the 
result stream
   2. Have separate timeout / retry policies that are applied to timeout errors 
specifically


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to