Jenovesan opened a new issue, #42153:
URL: https://github.com/apache/arrow/issues/42153

   ### Describe the usage question you have. Please include as many useful 
details as  possible.
   
   
   ### Program Goal
   Hello, 
   
   For my program, I am reading files sequentially. However, to speed up the 
program I want to preload the files async into a container so that they can 
already be read into memory when my program requests the file to be read.
   
   ### Solution?
   I've been scouring the docs and code and I think the best way to do this 
would be to have a `Dataset` containing the individual files as `RecordBatch`es 
and then use `Dataset::NewScan` to scan the whole dataset one `RecordBatch` at 
a time and as soon as the `RecordBatch` is read I can store it in the container.
   
   ### Additional Information
   Files are memory-mapped .feather files.
   In my dataset, there are thousands of files. Each file is either ~110KB or 
~5KB in size. 
   
   ### Conclusion
   If someone could let me know if this the best way to achieve what I want or 
guide me a in a better direction that would be great.
   
   Any advice would be greatly appreciated,
   Thanks
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to