2010YOUY01 opened a new issue, #6922: URL: https://github.com/apache/arrow-datafusion/issues/6922
### Is your feature request related to a problem or challenge? This issue is to address the remaining tasks from an initial parallel CSV scan PR https://github.com/apache/arrow-datafusion/pull/6801 The remaining tasks: 1. Use `get_opts()` for range read on local FS `get_opts()` is an interface for range streaming read from ObjectStore (local FS/ cloud storage), currently it's not supported for range read on local FS https://github.com/apache/arrow-rs/blob/0d4e6a727f113f42d58650d2dbecab89b22d4e28/object_store/src/lib.rs#L355 When it's implemented in `arrow-rs`, we can use it in parallel CSV scan implementation and possibly get some performance improvement (the current implementation will copy the whole CSV file range into memory at once instead of in a streaming fashion) 2. Use only 1 get operation from ObjectStore for each partition instead of 3 (see original PR discussion) It's easier to do task 2 after 1 is done (can do tests on the local filesystem) ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
