ariel-miculas opened a new issue, #693:
URL: https://github.com/apache/arrow-rs-object-store/issues/693

   **Describe the bug**
   See https://github.com/apache/datafusion/issues/21450
   Root cause: there's a `spawn_blocking` call per each 8KiBs read from the 
file, adding significant context switch overhead
   
   **To Reproduce**
   See https://github.com/apache/datafusion/issues/21446
   For the tests I've used a c7a.16xlarge ec2 instance, with a trimmed down 
version of hits.json to 51G (original has 217 GiB), with a warm cache (by 
running cat hits_50.json > /dev/null)
   
   **Expected behavior**
   <!--
   A clear and concise description of what you expected to happen.
   -->
   A more efficient implementation (e.g. tokio uses a buffer size of 2MiBs when 
reading files)
   
   **Additional context**
   https://github.com/apache/datafusion/pull/21478#discussion_r3078305447
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to