ozgrakkurt opened a new issue, #5240:
URL: https://github.com/apache/arrow-rs/issues/5240

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   I want to read parquet files with less latency/better throughput. 
[Glommio](https://github.com/DataDog/glommio) is a thread per core library that 
utilizes io_uring and direct_io. Direct IO is particularly nice for my case and 
I image for many other parquet use cases, because my usecase doesn't benefit at 
all from caching the parquet files in Linux PageCache, it even suffers from it. 
Also I use fast nvme disks to store the parquet files so direct_io/io_uring can 
give a big performance boost.
   
   
https://itnext.io/modern-storage-is-plenty-fast-it-is-the-apis-that-are-bad-6a68319fbc1a
   https://www.phoronix.com/news/OpenZFS-DirectIO-Performance
   
   **Describe the solution you'd like**
   Have a alternative io implementation for parquet similar to the async 
feature (it uses tokio). 
   
   It will implement glommio_reader, glommio_writer etc. similar to 
async_reader, async_writer etc.
   
   **Describe alternatives you've considered**
   Can open the file with O_DIRECT flag and just read with current async impl 
or sync impl but then it will crash because of unaligned buffers etc.
   
   Just using glommio should be better since don't need to implement alignment 
of buffers etc. Also glommio already has io_uring thread per core architecture 
etc. which is very nice for building database like systems.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to