tustvold opened a new issue, #2935:
URL: https://github.com/apache/arrow-datafusion/issues/2935

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   Currently `CsvOpener` and `JsonOpener` call 
[GetResult::bytes](https://docs.rs/object_store/latest/object_store/enum.GetResult.html#method.bytes)
 which downloads the entire file, prior to feeding it to the appropriate arrow 
reader.
   
   This is not ideal:
   
   * Adds decode latency as must buffer full payload before reading
   * May read more data than necessary (#2930)
   
   Following on from #2677 we now support streaming responses from object 
storage
   
   **Describe the solution you'd like**
   
   The underlying challenge is to take arbitrary `Stream<Bytes>` and convert it 
into a `Stream<Bytes>` where each stream element contains complete rows, as 
delimited by a newline character. Once we have this `DelimitedStream`, it is 
trivial to feed each of these byte chunks individually into the corresponding 
decoder.
   
   **Describe alternatives you've considered**
   
   We could not do this
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to