[GitHub] [arrow-rs] tustvold commented on issue #3740: Support for Async CSV Writer

via GitHub Thu, 23 Feb 2023 05:31:59 -0800


tustvold commented on issue #3740:
URL: https://github.com/apache/arrow-rs/issues/3740#issuecomment-1441803061


   > If you are also satisfied with the result of buffered version
   
   The performance across all seems to be basically comparable, it would be 
interesting to see a profile, but I suspect the difference is in the sizing of 
the intermediate buffer, which will be highly dependent on the destination sink 
as to what the optimal size is.
   
   > If you are also satisfied with the result of buffered version, I will add 
this functionality into CSV and JSON
   
   Thus far we have managed to avoid async within arrow-rs, and I think this 
encourages a nice separation of compute and IO. What do you think about adding 
this functionality instead to DataFusion and perhaps just adding a doc comment 
to arrow-rs showing how it can be done?
   
   e.g. something like (not tested)
   
   ```
   async fn write_async<I, F, Fut>(batches: I, flush: F) -> Result<(), 
ArrowError> where I: IntoIterator<Item=RecordBatch>, F: Fn(&[u8]) -> Fut, Fut: 
Future<Output=Result<(), ArrowError> {
       let mut buffer = Vec::with_capacity(4096);
       for batch in batches {
           {
               let mut writer = Writer::new(&mut buffer);
               writer.write(batch)?;
           }
           flush(&buffer).await?;
           buffer.clear()
       }
   }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] tustvold commented on issue #3740: Support for Async CSV Writer

Reply via email to