[I] Add an is_temporary flag to errors [arrow-rs-object-store]

via GitHub Tue, 02 Jun 2026 15:26:02 -0700


westonpace opened a new issue, #737:
URL: https://github.com/apache/arrow-rs-object-store/issues/737


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   I'd like to do my own retry handling because I want to integrate it with an 
AIMD-style throttling mechanism.  However, there is no easy way to determine if 
an error should be retried or not.  Each cloud provider is slightly different 
and object_store already does a lot of work examining errors to figure this 
out.  It would be great if the crate's error types indicated if errors were 
temporary or permanent.
   
   **Describe the solution you'd like**
   
   I think something like OpenDAL's 
[`is_temporary`](https://opendal.apache.org/docs/rust/opendal/struct.Error.html#method.is_temporary)
 and `is_permanent` would work fine.  Although, I would say I don't actually 
understand why `is_temporary`, `is_persistent`, and `is_permanent` are all 
needed.  It seems like just one of the three should be sufficient.
   
   
   **Describe alternatives you've considered**
   
   At the moment we do 
[this](https://github.com/lance-format/lance/blob/main/rust/lance-io/src/object_store/throttle.rs)
 which makes me very uncomfortable (e.g. what if the file URI has `slowdown`):
   
   ```
   
   /// Check whether an `object_store::Error` represents a throttle response
   /// (HTTP 429 / 503) from a cloud object store.
   ///
   /// Regrettably, this information is not fully exposed by the `object_store` 
crate.
   /// There is no generic mechanism for a custom object store to return a 
throttle error.
   ///
   /// However, the builtin object stores all use RetryError when retries are 
configured and
   /// throttle errors are returned.  Sadly, RetryError is not a public type, 
so we have to
   /// infer it from the error message.  This is potentially dangerous because 
these errors
   /// often include the URI itself and that URI could have any characters in 
it (e.g. if we
   /// look for 429 then we might match a 429 in a UUID).These error messages 
currently look like:
   ///
   /// ", after ... retries, max_retries: ..., retry_timeout: ..."
   ///
   /// So, as a crude heuristic, which should work for the builtin object 
stores, but won't
   /// work for custom object stores, we simply look for the string "retries, 
max_retries"
   /// in the error message.
   pub fn is_throttle_error(err: &object_store::Error) -> bool {
       // Only Generic errors can carry throttle responses
       if let object_store::Error::Generic { source, .. } = err {
           let message = source.to_string();
           let lowercase = message.to_ascii_lowercase();
           lowercase.contains("retries, max_retries")
               || lowercase.contains("serverbusy")
               || lowercase.contains("server busy")
               || lowercase.contains("egress is over the account limit")
               || lowercase.contains("http 429")
               || lowercase.contains("status code: 429")
               || lowercase.contains("429 too many requests")
               || lowercase.contains("too many requests")
               || lowercase.contains("slowdown")
               || lowercase.contains("please reduce your request rate")
               || lowercase.contains("rate limit")
               || lowercase.contains("throttling")
               || lowercase.contains("throttled")
       } else {
           false
       }
   }
   ```
   
   Another alternative I explored was implementing this AIMD throttling as an 
object_store retry config of some kind but the retry config mechanism is not 
flexible enough for this.
   
   **Additional context**
   
   The primary reason we want AIMD throttling is because we often have many 
nodes accessing cloud storage.  When something bad happens and they all start 
making a lot of requests against object storage then the whole system gets 
overloaded and is unable to recover.  With AIMD the system will still slow down 
but it does recover and continues.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Add an is_temporary flag to errors [arrow-rs-object-store]

Reply via email to