mapleFU commented on issue #39005: URL: https://github.com/apache/arrow/issues/39005#issuecomment-1835592735
Yeah, arrow has internal "pre_buffer` config, enabling it will making read-parquet issue all neccessary IO and buffer them in memory During read a local file, arrow might just call `ReadAt` to read the local parquet small page, because it regard local read as a lightweight operation. The same straitegy might causing lots of `Get` calls for cloud storage. So it will try to "collapse" the read request: it will merge adjacent together to avoid fragment read calls. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
