westonpace commented on pull request #11588: URL: https://github.com/apache/arrow/pull/11588#issuecomment-961333636
For context it is probably worth pointing out that @niyue recently added #11486 which gets around the classic "IPC reader reads the entire file even if you only want a few columns" issue. However, I agree with @pitrou . It sounds like you are not just limiting which columns you are accessing but you are also accessing very few rows. In that case the problem is likely the fact that the record batch file reader loads the entire array via `ArrayLoader => GetBuffer => ReadBuffer => RandomAccessFile::ReadAt(entire-buffer-range)` And, in `MemoryMappedFile::ReadAt` we call `::arrow::internal::MemoryAdviseWillNeed` on the entire range accessed. Int hat case, the solution is what Antoine suggested. We should provide an option in MemoryMappedFile to prevent calls to madvise. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
