westonpace opened a new issue, #737: URL: https://github.com/apache/arrow-rs-object-store/issues/737
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** I'd like to do my own retry handling because I want to integrate it with an AIMD-style throttling mechanism. However, there is no easy way to determine if an error should be retried or not. Each cloud provider is slightly different and object_store already does a lot of work examining errors to figure this out. It would be great if the crate's error types indicated if errors were temporary or permanent. **Describe the solution you'd like** I think something like OpenDAL's [`is_temporary`](https://opendal.apache.org/docs/rust/opendal/struct.Error.html#method.is_temporary) and `is_permanent` would work fine. Although, I would say I don't actually understand why `is_temporary`, `is_persistent`, and `is_permanent` are all needed. It seems like just one of the three should be sufficient. **Describe alternatives you've considered** At the moment we do [this](https://github.com/lance-format/lance/blob/main/rust/lance-io/src/object_store/throttle.rs) which makes me very uncomfortable (e.g. what if the file URI has `slowdown`): ``` /// Check whether an `object_store::Error` represents a throttle response /// (HTTP 429 / 503) from a cloud object store. /// /// Regrettably, this information is not fully exposed by the `object_store` crate. /// There is no generic mechanism for a custom object store to return a throttle error. /// /// However, the builtin object stores all use RetryError when retries are configured and /// throttle errors are returned. Sadly, RetryError is not a public type, so we have to /// infer it from the error message. This is potentially dangerous because these errors /// often include the URI itself and that URI could have any characters in it (e.g. if we /// look for 429 then we might match a 429 in a UUID).These error messages currently look like: /// /// ", after ... retries, max_retries: ..., retry_timeout: ..." /// /// So, as a crude heuristic, which should work for the builtin object stores, but won't /// work for custom object stores, we simply look for the string "retries, max_retries" /// in the error message. pub fn is_throttle_error(err: &object_store::Error) -> bool { // Only Generic errors can carry throttle responses if let object_store::Error::Generic { source, .. } = err { let message = source.to_string(); let lowercase = message.to_ascii_lowercase(); lowercase.contains("retries, max_retries") || lowercase.contains("serverbusy") || lowercase.contains("server busy") || lowercase.contains("egress is over the account limit") || lowercase.contains("http 429") || lowercase.contains("status code: 429") || lowercase.contains("429 too many requests") || lowercase.contains("too many requests") || lowercase.contains("slowdown") || lowercase.contains("please reduce your request rate") || lowercase.contains("rate limit") || lowercase.contains("throttling") || lowercase.contains("throttled") } else { false } } ``` Another alternative I explored was implementing this AIMD throttling as an object_store retry config of some kind but the retry config mechanism is not flexible enough for this. **Additional context** The primary reason we want AIMD throttling is because we often have many nodes accessing cloud storage. When something bad happens and they all start making a lot of requests against object storage then the whole system gets overloaded and is unable to recover. With AIMD the system will still slow down but it does recover and continues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
