kdn36 opened a new issue, #726:
URL: https://github.com/apache/arrow-rs-object-store/issues/726

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   Using object_store from polars with high-concurrency IO in distributed mode 
on EKS environment using S3 storage may lead to DNS flooding. Subsequently, 
this results in significant slowdown of the overall query.
   
   **Describe the solution you'd like**
   We would like to see object_store to have a lightweight built-in DNS cache 
with shuffling, exposed through configuration option(s). Much smaller than a 
full-blown DNS resolver such as `hickory-resolver`.
   
   **Describe alternatives you've considered**
   (1) Out-of-the-box, object_store give shuffling, but no caching.
   (2) One can add `reqwest` with the `hickory-resolver` feature, and turn off 
`RandomizeAddresses`, but this only gives caching and no shuffling. The cost 
(importing a full DNS resolver) is high, too high for a dataframe library. The 
configuration feels hacky.
   (3) Once can add a custom client and connector via `with_http_connector` 
which takes on the extra DNS responsibilities, but then the logic to handle 
`ClientOptions` needs to be replicated which may lead to drift over time.
   
   For now, option (3) may be our best stopgap solution, but it would be 
cleaner and easier to maintain if all dependent functionality is contained and 
configurable.
   
   **Additional context**
   The following setup can trigger DNS flooding:
   - default object_store and reqwest as used by polars
   - polars distributed with high-concurrency (large TCP pool to servce many 
concurrent `get_range` requests)
   - polars distributed running on default EKS, e.g., on 32 instances (note, 
EKS shares DNS, unlike a typical standalone OS)
   - (further aggravated by the default `ndots:5` and dual-stack IPv4+v6)
   The above results in DNS flooding to the pod that serves CoreDNS.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to