alamb opened a new issue, #21159:
URL: https://github.com/apache/datafusion/issues/21159

   ### Is your feature request related to a problem or challenge?
   
   While looking at traces for the morsel driven scan
   - #20529 
   
   @Dandandan and I noticed there is substantial potential work done on other 
blocking threads just to read data
   
   I was thinking that for local files, we might do substantially better using 
mmap / the kernel page cache
   
   ### Describe the solution you'd like
   
   Specifically, a new wrapper for ObjectStore that will memory map files when 
they are opened, and then provide a  way to read (using zero copy 
`Bytes::slice`)  from those memory mapped files.
   
   I am not sure it would actually be faster -- so the first thing to do would 
be to code it up and try it out / see how fast we can get it
   
   
   
   
   ### Describe alternatives you've considered
   
     Use the mmap2 crate as shown in this example: 
https://github.com/apache/arrow-rs/blob/main/arrow/examples/
     zero_copy_ipc.rs
   
     Open the file, meory map it, and then turn it into `Bytes` that is then 
zero copied when requested. At first keep
     all mmap files. We will eventually implement a cap / LRU for the number of 
open mmaps, but to start we can just
     keep them all open and see how that goes.
   
     
https://github.com/apache/arrow-rs/blob/main/arrow/examples/zero_copy_ipc.rs#L46-L47
   
     Then we will wire this into datafusio-cli here so it is used for file urls
   
     
https://github.com/alamb/datafusion/blob/7ef62b988d19c75e737b57f1491cfc1cd9222466/datafusion-cli/src/
     object_storage.rs#L567-L566
   
   I think the mmap file object store should not wrap another object store, but 
instead it could implement all functions directly. Use the LocalFileSystem as 
an example for how to implement the various methods if needed
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to