tustvold commented on issue #3740:
URL: https://github.com/apache/arrow-rs/issues/3740#issuecomment-1441803061
> If you are also satisfied with the result of buffered version
The performance across all seems to be basically comparable, it would be
interesting to see a profile, but I suspect the difference is in the sizing of
the intermediate buffer, which will be highly dependent on the destination sink
as to what the optimal size is.
> If you are also satisfied with the result of buffered version, I will add
this functionality into CSV and JSON
Thus far we have managed to avoid async within arrow-rs, and I think this
encourages a nice separation of compute and IO. What do you think about adding
this functionality instead to DataFusion and perhaps just adding a doc comment
to arrow-rs showing how it can be done?
e.g. something like (not tested)
```
async fn write_async<I, F, Fut>(batches: I, flush: F) -> Result<(),
ArrowError> where I: IntoIterator<Item=RecordBatch>, F: Fn(&[u8]) -> Fut, Fut:
Future<Output=Result<(), ArrowError> {
let mut buffer = Vec::with_capacity(4096);
for batch in batches {
{
let mut writer = Writer::new(&mut buffer);
writer.write(batch)?;
}
flush(&buffer).await?;
buffer.clear()
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]