westonpace commented on pull request #11486: URL: https://github.com/apache/arrow/pull/11486#issuecomment-954355732
> ArrayLoader involves quite a lot arrow structures, and I am not familiar with some of them, so I try to follow current organization to make it work so far. Ok. That is fine. Thank you for considering. > I think probably we can close ARROW-12683 and I will create a JIRA issue to track the async version of the reader enhancement as follow-up. What do you think? Sounds great. > In my test under Linux, I found Linux will do read ahead IO... I did some testing with `POSIX_FADV_WILLNEED` and didn't ever see much benefit over Linux's builtin readahead. > I don't look into how S3FileSystem handles this It does not currently handle this. We get pretty poor performance with the IPC reader on S3 because there is no readahead / batching (and there is a high latency per request). Handling this at the filesystem level is an interesting thought. The challenge will be that the filesystem is parallel so we sometimes want to allow multiple reads (instead of queuing and plugging/merging) but the filesystem doesn't know the access pattern. Maybe we can still come up with a good strategy. We have ARROW-14429 for this already so no need to solve this problem right now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
