2010YOUY01 opened a new issue, #6922:
URL: https://github.com/apache/arrow-datafusion/issues/6922

   ### Is your feature request related to a problem or challenge?
   
   This issue is to address the remaining tasks from an initial parallel CSV 
scan PR https://github.com/apache/arrow-datafusion/pull/6801
   
   The remaining tasks:
   1. Use `get_opts()` for range read on local FS
   `get_opts()` is an interface for range streaming read from ObjectStore 
(local FS/ cloud storage), currently it's not supported for range read on local 
FS 
https://github.com/apache/arrow-rs/blob/0d4e6a727f113f42d58650d2dbecab89b22d4e28/object_store/src/lib.rs#L355
   When it's implemented in `arrow-rs`, we can use it in parallel CSV scan 
implementation and possibly get some performance improvement (the current 
implementation will copy the whole CSV file range into memory at once instead 
of in a streaming fashion)
   2. Use only 1 get operation from ObjectStore for each partition instead of 3 
(see original PR discussion)
   
   It's easier to do task 2 after 1 is done (can do tests on the local 
filesystem)
   
   ### Describe the solution you'd like
   
   _No response_
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to