pitrou commented on pull request #11588: URL: https://github.com/apache/arrow/pull/11588#issuecomment-960614235
I'd like to reboot the discussion and stop discussing flag combinations without regard for the original issue. Here is the complaint: > In my test, if the access pattern is random access (binary searching an array in a memory mapped arrow IPC file in my case), I find OS (Linux) will prefetch data, and lots of IO are wasted (90% in my test), page cache is full of never used data as well. So, to sum it up: * the Arrow IPC layer issues `madvise` calls for record batches that are read by the user, so that the OS prefetches them in the background * here, the user doesn't _want_ the record batches to be prefetched, because they are only doing very sparse reads and ignoring most of the remaining data @niyue Am I right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
