adriangb commented on PR #20481: URL: https://github.com/apache/datafusion/pull/20481#issuecomment-3980309044
We currently load the page index / bloom filter info for all row groups in one IO right? I imagine the key is to make IO operations large enough: if the page index metadata for a single row group is 2kB that's a waste of IO. If it's 4MB doing 8 row groups x 4MB at once ~= individual 4MB requests (the latter may even be faster). But that depends on the storage... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
