alamb opened a new issue, #21159:
URL: https://github.com/apache/datafusion/issues/21159
### Is your feature request related to a problem or challenge?
While looking at traces for the morsel driven scan
- #20529
@Dandandan and I noticed there is substantial potential work done on other
blocking threads just to read data
I was thinking that for local files, we might do substantially better using
mmap / the kernel page cache
### Describe the solution you'd like
Specifically, a new wrapper for ObjectStore that will memory map files when
they are opened, and then provide a way to read (using zero copy
`Bytes::slice`) from those memory mapped files.
I am not sure it would actually be faster -- so the first thing to do would
be to code it up and try it out / see how fast we can get it
### Describe alternatives you've considered
Use the mmap2 crate as shown in this example:
https://github.com/apache/arrow-rs/blob/main/arrow/examples/
zero_copy_ipc.rs
Open the file, meory map it, and then turn it into `Bytes` that is then
zero copied when requested. At first keep
all mmap files. We will eventually implement a cap / LRU for the number of
open mmaps, but to start we can just
keep them all open and see how that goes.
https://github.com/apache/arrow-rs/blob/main/arrow/examples/zero_copy_ipc.rs#L46-L47
Then we will wire this into datafusio-cli here so it is used for file urls
https://github.com/alamb/datafusion/blob/7ef62b988d19c75e737b57f1491cfc1cd9222466/datafusion-cli/src/
object_storage.rs#L567-L566
I think the mmap file object store should not wrap another object store, but
instead it could implement all functions directly. Use the LocalFileSystem as
an example for how to implement the various methods if needed
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]