ShiKaiWi commented on issue #1269: URL: https://github.com/apache/arrow-rs/issues/1269#issuecomment-1469600914
@tustvold Thanks for your quick response. Here are my some thoughts about your proposal: > This would then allow providing a Vec<u8> as the writer, and then periodically gaining access to it and flushing its contents asynchronously. We could provide an AsyncWriter that encapsulates this logic, but we could also just provide a code example in a doc comment. I guess the **periodically gaining access** is not an elegant way (but it indeed can solve the problem we encounter), because the polling frequency can't be determined easily, that is to say, polling with high frequency may introduce overhead and with slow frequency may lead to high memory usage by the inner writer `Vec<u8>`. > The nature of parquet is that an entire row group is buffered up and written in one shot, as data for different columns cannot be interleaved, so I'm not sure it is possible to do better than this For now, I doesn't know the details of the writer very well, but I guess the buffer logic can still be kept in the async writer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
