ariel-miculas opened a new issue, #693: URL: https://github.com/apache/arrow-rs-object-store/issues/693
**Describe the bug** See https://github.com/apache/datafusion/issues/21450 Root cause: there's a `spawn_blocking` call per each 8KiBs read from the file, adding significant context switch overhead **To Reproduce** See https://github.com/apache/datafusion/issues/21446 For the tests I've used a c7a.16xlarge ec2 instance, with a trimmed down version of hits.json to 51G (original has 217 GiB), with a warm cache (by running cat hits_50.json > /dev/null) **Expected behavior** <!-- A clear and concise description of what you expected to happen. --> A more efficient implementation (e.g. tokio uses a buffer size of 2MiBs when reading files) **Additional context** https://github.com/apache/datafusion/pull/21478#discussion_r3078305447 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
